Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading metadata methods on PNG images are discarding some custom key/value #147

Open
MarcoMartins86 opened this issue Oct 31, 2017 · 21 comments
Labels
enhancement feature / functionality enhancements
Projects
Milestone

Comments

@MarcoMartins86
Copy link

Hi, I am using this lib to read all sorts of image metadata and while I was trying to figure it out why some metadata values on PNG images did not appear on the PngImage class data, I notice they were found but discarded inside the PngChunk::parseChunkContent. Although this seems to be by design I think this can also be interpreted as an issue. Since this information could be saved on PngImage class as a container like the other metadata types, and later on the write metadata for this type of metadata fields could also be implemented.

@clanmills
Copy link
Collaborator

The reason we don't support this metadata is two fold:

  1. We support Exif, IPTC, XMP and ICC
  2. This was only brought recently to my attention: http://dev.exiv2.org/issues/1312

Your analysis of how to implement this is correct. This would involve adding another metadata container. We have a long term goal (and working prototype) to support a single "unified" metadata container. Currently Team Exiv2 has a full pipeline of scheduled activity. Without another engineer to embrace this topic, implementation isn't scheduled (hence target 1.0).

If you would like to join the team and work on this subject, I assure you of a warm welcome and great colleagues who will mentor and support you.

As this topic has only been raised recently, I'm don't even know the extend of the metadata involved. Even if you don't want to get involved in implementation, you could make a very valuable to investigate the "scope" of the metadata involved. To convert your "some" metadata into a table (or spec) of "PNG Metadata Fields".

Let me condense the wide-ranging discussion on Redmine with Arnold using his files http://dev.exiv2.org/attachments/download/1184/itxt-german.png
and
http://dev.exiv2.org/attachments/download/1183/itxt2.png

$ exiv2 -pR http://dev.exiv2.org/attachments/download/1183/itxt2.png
STRUCTURE OF PNG FILE: http://dev.exiv2.org/attachments/download/1183/itxt2.png
 address | chunk |  length | data                           | checksum
       8 | IHDR  |      13 | ...[...E.....                  | 0x52edaae4
      33 | gAMA  |       4 | ....                           | 0x0bfc6105
      49 | sBIT  |       4 | ....                           | 0x4da52df6
      65 | bKGD  |       6 | ......                         | 0x95cd2f20
      83 | pCAL  |      44 | bogus units...........foo/bar  | 0x57407b1c
     139 | pHYs  |       9 | .........                      | 0xd2dd7efc
     160 | tIME  |       7 | .....:.                        | 0x8eff267a
     179 | tEXt  |       9 | Title.PNG                      | 0xdc017935
     200 | iTXt  |      39 | Author...fr.Auteur.La plume de | 0x4fdb72e1
     251 | zTXt  |      26 | test.....N.)-Qx.P.........     | 0xa869e99d
     289 | IDAT  |    4828 | x..\+........EFFb...X$.......D | 0x353e27e9
    5129 | zTXt  |     202 | Description....M.Mj.@...>.[... | 0xa9a15024
    5343 | iTXt  |     111 | Warning...de.WARNING........0. | 0xb3adddee
    5466 | IEND  |       0 |                                | 0xae426082
$ exiv2 -pR http://dev.exiv2.org/attachments/download/1184/itxt-german.png
STRUCTURE OF PNG FILE: http://dev.exiv2.org/attachments/download/1184/itxt-german.png
 address | chunk |  length | data                           | checksum
       8 | IHDR  |      13 | ...[...E.....                  | 0x52edaae4
      33 | gAMA  |       4 | ....                           | 0x0bfc6105
      49 | sBIT  |       4 | ....                           | 0x4da52df6
      65 | pCAL  |      44 | bogus units...........foo/bar  | 0x57407b1c
     121 | tIME  |       7 | .....:.                        | 0x8eff267a
     140 | bKGD  |       6 | ......                         | 0x95cd2f20
     158 | pHYs  |       9 | .........                      | 0xd2dd7efc
     179 | tEXt  |       9 | Title.PNG                      | 0xdc017935
     200 | iTXt  |      39 | Author...fr.Auteur.La plume de | 0x4fdb72e1
     251 | IDAT  |    4000 | x..\+........EFFb...X$.......D | 0x199b9fd7
    4263 | IDAT  |     831 | .......#T..6.....`....G...(<.. | 0xa028a770
    5106 | zTXt  |     202 | Description..x.M.Mj.@...>.[... | 0x692a52f9
    5320 | iTXt  |     111 | Warning...de.WARNING.x......0. | 0xb5cd3e90
    5443 | iTXt  |      65 | Deutsch...de..Steinstra..e 10, | 0x23616b74
    5520 | IEND  |       0 |                                | 0xae426082
$ 

I believe Exiv2 ignores chunks such as Author, Description, Warning and Deutsch which are not part of the supported formats (Exif, IPTC, XMP and ICC). However, I'm sure there are other fields and we need a "spec" to understand the extend of the effort involved.

Another word of encouragement to "sign up" for this challenge is to say that I have found working with Arnold to be a pleasant and enjoyable experience. When you work with him, he'll be quick to test, comment and encourage you. A very nice guy. wiegert@telus.net

@clanmills
Copy link
Collaborator

I'm going to close this for now. The issue of providing additional PNG metadata support is logged in Redmine http://dev.exiv2.org/issues/1312 I've updated Redmine#1312 with a pointer to this discussion. If @MarcoMartins86, or anybody else, wishes to discuss this matter further, we can reopen this issue.

@MarcoMartins86
Copy link
Author

Hi @clanmills , thank for your quick reply. So I might be able to help in the development of this feature, do I contact Arnold directly or is there another process to get more information in this matter?

@clanmills
Copy link
Collaborator

Very good. Yes, let Arnold know that you are willing to work on this. He’ll be delighted and will work with you to test your code. Arnold is a ‘friendly user’ and can’t help you to write the code. It appears to me that you have understood how to implement this feature. I’ll send you an invitation to join the team. Before you write any code, we need to ‘scope’ the feature to know what’s involved.

@clanmills clanmills reopened this Nov 2, 2017
@clanmills
Copy link
Collaborator

clanmills commented Nov 2, 2017

Here's a document about PNG metadata: https://docs.oracle.com/javase/8/docs/api/javax/imageio/metadata/doc-files/png_metadata.html

That's a good start, however it's not the whole story as that spec explicitly defines chunk types such as ICC, Palette, ColorSpace and others. However zTXt and tEXt are chunks that contain structured data:

540 rmills@rmillsmbp:~/temp $ grep ELEMENT pngmetadata.txt 
  <!ELEMENT "javax_imageio_png_1.0" (IHDR?, PLTE?, bKGD?, cHRM?,
    <!ELEMENT "IHDR" EMPTY>
    <!ELEMENT "PLTE" (PLTEEntry)*>
      <!ELEMENT "PLTEEntry" EMPTY>
    <!ELEMENT "bKGD" (bKGD_Grayscale | bKGD_RGB | bKGD_Palette)>
      <!ELEMENT "bKGD_Grayscale" EMPTY>
      <!ELEMENT "bKGD_RGB" EMPTY>
      <!ELEMENT "bKGD_Palette" EMPTY>
    <!ELEMENT "cHRM" EMPTY>
    <!ELEMENT "gAMA" EMPTY>
    <!ELEMENT "hIST" (hISTEntry)*>
      <!ELEMENT "hISTEntry" EMPTY>
    <!ELEMENT "iCCP" EMPTY>
    <!ELEMENT "iTXt" (iTXtEntry)*>
      <!ELEMENT "iTXtEntry" EMPTY>
    <!ELEMENT "pHYS" EMPTY>
    <!ELEMENT "sBIT" (sBIT_Grayscale | sBIT_GrayAlpha | sBIT_RGB |
      <!ELEMENT "sBIT_Grayscale" EMPTY>
      <!ELEMENT "sBIT_GrayAlpha" EMPTY>
      <!ELEMENT "sBIT_RGB" EMPTY>
      <!ELEMENT "sBIT_RGBAlpha" EMPTY>
      <!ELEMENT "sBIT_Palette" EMPTY>
    <!ELEMENT "sPLT" (sPLTEntry)*>
      <!ELEMENT "sPLTEntry" EMPTY>
    <!ELEMENT "sRGB" EMPTY>
    <!ELEMENT "tEXt" (tEXtEntry)*>
      <!ELEMENT "tEXtEntry" EMPTY>
    <!ELEMENT "tIME" EMPTY>
    <!ELEMENT "tRNS" (tRNS_Grayscale | tRNS_RGB | tRNS_Palette)>
      <!ELEMENT "tRNS_Grayscale" EMPTY>
      <!ELEMENT "tRNS_RGB" EMPTY>
      <!ELEMENT "tRNS_Palette" EMPTY>
    <!ELEMENT "zTXt" (zTXtEntry)*>
      <!ELEMENT "zTXtEntry" EMPTY>
    <!ELEMENT "UnknownChunks" (UnknownChunk)*>
      <!ELEMENT "UnknownChunk" EMPTY>

For example the metadata zTXt is a "container" for Exif, IPTC and XMP.

544 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2 $ exiv2 -pS test/data/ReaganLargePng.png 
STRUCTURE OF PNG FILE: test/data/ReaganLargePng.png
 address | chunk |  length | data                           | checksum
       8 | IHDR  |      13 | ............                   | 0x8cf910c3
      33 | zTXt  |    8461 | Raw profile type exif..x...iv. | 0x91fbf6a0  <--- EXIF hides here
    8506 | zTXt  |     636 | Raw profile type iptc..x..TKn. | 0x4e5178d3  <--- EXIF hides here 
    9154 | iTXt  |    7156 | XML:com.adobe.xmp.....<?xpacke | 0x8d6d70ba  <--- XMP hides here
   16322 | gAMA  |       4 | ....                           | 0x0bfc6105
   16338 | iCCP  | 1151535 | ICC profile..x...UP.........!! | 0x11f49e31   <--- ICC here (and in the "high level" spec).
 1167885 | bKGD  |       6 | ......                         | 0xa0bda793
 1167903 | pHYs  |       9 | ...#...#.                      | 0x78a53f76
 1167924 | tIME  |       7 | ......2                        | 0x582d32e4
 1167943 | zTXt  |     278 | Comment..x.}..n.@....O..5..h.. | 0xdb1dfff5
 1168233 | IDAT  |    8192 | x...k.%.u%....D......GWW...ER. | 0x929ed75c
 1176437 | IDAT  |    8192 | .F('.T)\....D"]..."2 '(...D%.. | 0x52c572c0
 1184641 | IDAT  |    8192 | y-.....>....3..p.....$....E.Bj | 0x65a90ffb
 1192845 | IDAT  |    8192 | ....S....?..G.....G........... | 0xf44da161
 1201049 | IDAT  |    7173 | .evl...3K..j.S.....x......Z .D | 0xbe6d3574
 1208234 | IEND  |       0 |                                | 0xae426082
545 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2 $ 

I recommend that you and Arnold discuss this together. I'm happy to join your discussion, although I don't know much about PNG.

We use Slack in Team Exiv2 as our Instant Messenger. If you provide your email address, I'll send you an invitation. If you want to keep your email address private, please drop me an email: robin@clanmills.com

@MarcoMartins86
Copy link
Author

Hi Robin, I have sent you a private email a couple of days ago. Since I am still not on your group in Slack I assume that you didn't receive it or that you still didn't have the time. Reach me if you need me to send it again.

@D4N
Copy link
Member

D4N commented Nov 7, 2017

Robin is currently on vacation, so it might take a little longer before he can reply.

@MarcoMartins86
Copy link
Author

MarcoMartins86 commented Nov 20, 2017

After checking what was already done in the code and after reading http://www.libpng.org/pub/png/spec/iso/index-object.html chapter 11.3.4 Textual information. It looked to me that we are already successfully parsing mostly if not all the formats described in the mentioned chapter and then discarding the data. So my proposed approach is to add a new pngti.hpp and respective pngti.cpp where a new container for these different types of data will be defined and implemented so it can be used inside the PngImage class and filled on its readMetadata method. In the meanwhile, I will do any necessary adjustments to what is now done on parsing. So my goal will be mainly to read into an appropriate data container the Textual information metadata of PNG images. After this is done and merged I will think of how to write these types of metadata.

@tester0077
Copy link
Collaborator

Hi Marco,
I'll keep an eye on this topic, though I'm off chasing other 'rabbits' down their holes ;-)
Whenever possible, I'll try to test any changes you make, though it will likely mean I'll have to resurrect some older code setup.

@clanmills
Copy link
Collaborator

Marco:

Well done to isolate 11.3.4 in http://www.libpng.org/pub/png/spec/iso/index-object.html Very useful. That identified the following elements:

Title | Short (one line) title or caption for image
Author | Name of image's creator
Description | Description of image (possibly long)
Copyright | Copyright notice
Creation Time | Time of original image creation
Software | Software used to create the image
Disclaimer | Legal disclaimer
Warning | Warning of nature of content
Source | Device used to create the image
Comment | Miscellaneous comment

And, I think pngchunk.cpp will parse and rewrite chunk for you (I can't remember, I haven't looked at the code recently). If you set a breakpoint on pngimage.cpp readMetadata() and writeMetadata() you'll discover how the code works. Let me know if you want to discuss it.

However parsing and writing the metadata is only part of the challenge. We have to solve how to "address" this metadata. Exiv2 has three "families" of metadata (Exif, Iptc and Xmp) which are subdivided by "group" and "tag". For example: Exif.Image.Make. We could use use Exif.Png.Description etc..., however this isn't Exif metadata. (Nor Iptc, nor Xmp). We could introduce a family "Png" (with a PngMetadata vector API). As things stand this would be Png.Description. Can the code deal with duplex X.Y keys, or does it have to be triple X.Y.Z key? I see there are other types of metadata in 11.3.5 Miscellaneous information. Perhaps they should also be supported. So, then we fix for the X.Y keys, as we can have Png.Text.Description etc, Png.Misc.Background.

Can you consider is how this metadata can (or cannot) be supported in the .exv file. This is an extracted metadata file (it's something like a JPEG with no image). Because it's a JPEG, it can write Exif, Iptc, Xmp (and ICC profiles) using the embedding scheme for JPEG. Perhaps we can use another APP1 segment to store the PNG metadata. ExifTool supports .exv and we should be careful to maintain compatibility.

Some other matters:

  1. The "unified metadata container"
    Perhaps we can introduce that into Exiv2 v0.27 with PNG metadata support and avoid introducing more metadata containers.
  2. The command-line switches. Eg -px will print only Xmp. I think -pj is available for print only Jpg.
  3. What updates are required to the samples?
  4. Update to the test harness
  5. Update to exiv2.1 (the man page)

Incidentally, the Video metadata was all added as Xmp.Video.XXX I wasn't involved with the Video project, however I believe the "Xmp" family was hi-jacked to present video metadata without introducing another "family". I don't think we should do that again by introducing Xmp.Png.Description etc.

As you can see, there's a little more to this than appears at first glance (and my second glance will also contain omissions). However don't get discouraged - you'll get there. I'm hoping that v0.27 with security fixes will reach RC1 in December and be released in 2018. This "Png Metadata" can target v0.28.

@MarcoMartins86
Copy link
Author

Robin,

Yes, I think it is best to introduce a new PngMetadata vector and so implement the "unified metadata container" concept just to this type, since in the future it seems that all metadata will be using this concept (just remind me where can I find the functional example of this code?).
I was not thinking in subdividing by group or family but by a kind of known paths logical approach, so X.Y.K or X.Y.Z.K doesn't matter, from left to right the first unknown part of the path was the key.

Proposed steps:

  1. Add the unified metadata container for a PngMetadata vector and store the chosen metadata types (I can try to add them all but not sure if this is the correct approach).
  2. Implement a new command-line switch for printing PngMetadata.
  3. Try to solve the exported .exv and the new metadata problem.
  4. Check what can be done to update the tests on reading PngMetadata and add new samples.
  5. Merge the code to master here.
  6. Update the man pages (read part).
  7. Make it possible to write PngMetadata and deal with the different types constraints.
  8. Make it possible to import a .exv file to an image.
  9. Write test and samples related to writing PngMetadata.
  10. Merge the code to master here.
  11. Update the man pages (write part).

@clanmills
Copy link
Collaborator

The "unified metadata container" code is in the "unstable" branch on SVN. Something like svn://dev.exiv2.org/svn/branches/unstable I built that code last year and I believe it's working. However, I have not studied it nor stepped it in the debugger to discover how it works. Getting that into Exiv2 would be a "headline" feature for v0.28.

I think you should add the 12 elements above (Title .. Comment) as Png.Text.Title, Png.Text.Comment You can easily add Misc and other parts of the PNG spec if requested.

However, let's not be in a hurry to merge "unified metadata container" with master. I'd like to start work on releasing master as v0.27 in December and hope to reach RC1 for Christmas. After that, we'll probably have a 0.27 branch to which we'll selectively add fixes and mods in 2018. Master will then be the 0.28 development branch.

I'm in Vietnam at the moment for the wedding of Tuan and Alice on Saturday 25 November. Tuan was a GSoC student in 2013 and contributed the "web ready" feature. I'll be home in England on 30 November. Hope to start work on releasing v0.27 on Monday 4 December.

@D4N
Copy link
Member

D4N commented Nov 23, 2017

@clanmills Do you know if there are any further branches on svn that are not on git/github? I think we should migrate them, otherwise we are going to forget / loose them.

@MarcoMartins86 Thanks for working on this. Please keep in mind to keep the PRs in a reviewable size, as others must review them before being merged. We are going to write a contribution guide soon-ish where all of this is going to be explained.

@clanmills
Copy link
Collaborator

I only discovered about the "unstable" branch when I visited Andreas last year. There is a "video write" branch, however I don't think it's worth saving. The video-read code is in the code base and needs a lot of work. The video-write code is based on that. Regrettably, I don't think it's worth salvaging.

There is an SVN "team" repos: svn://dev.exiv2.org/svn/team which has useful stuff that doesn't belong in 'master' such as the Logo, some drawings and meeting notes. Importantly, I moved 'website' there a few months ago when somebody complained about it being in 'master'. If you move 'website' to Git, I will be unable to work on the release in December as I have found it impossible to work with git.

@D4N
Copy link
Member

D4N commented Nov 24, 2017 via email

@clanmills
Copy link
Collaborator

I think we can leave svn://dev.exiv2.org/svn/team (and unstable) in svn for now. If it's not broken, don't change it. What you decide for the longer term, is up to you. I believe GitHub provides a "wiki" in which we can place the team resources without them being downloaded to every clone of 'master'.

@clanmills
Copy link
Collaborator

This is related to the Unified Metadata Container which is a top-level feature of v1.00. I'm closing this issue as this topic because it will be dealt with in #1505 and other related issues.

@clanmills
Copy link
Collaborator

I'm going to close this to reduce the number of open issues. The Unified Metadata Container and Family PNG are a top-level feature of v1.00 and will be delivered.

@piponazo
Copy link
Collaborator

piponazo commented Jan 7, 2022

Hi,

Should we reopen this issue and assign it to the project "1.0"? I think in this issue there is relevant information to be considered for extending the parsing capabilities for PNG files.

Although #1505 is a super-set capturing this PNG-parsing feature, one might think about first extending the PNG parsing capabilities without addressing the major "Unified Metadata Concept" feature.

@clanmills
Copy link
Collaborator

clanmills commented Jan 7, 2022

@piponazo. Yes. I think this should be re-opened with a target of v1.00. Earlier this year I built the "unstable" branch and it works. We need to study that code and port it to 'main'. This is of great value to Exiv2 and could be the defining feature of v1.00.

I discuss PNG "Textual information" in my book (Author etc). So to apply the Exiv2 Tag syntax (Family.Group.Name), we could use "PNG.Text.Author". The PNG Specification permits users to define other keywords. I think every member of the PNG Family will be in group Text, although there could be other groups such as comments.

The reason that the Unified Metadata Container is necessary is to avoid introducing another family metadata vector similar to Exiv2::ExifData, IptcData and XmpData.

11.3.4.2 Keywords and text strings

The following keywords are predefined and should be used where appropriate.

Title Short (one line) title or caption for image
Author Name of image's creator
Description Description of image (possibly long)
Copyright Copyright notice
Creation Time Time of original image creation
Software Software used to create the image
Disclaimer Legal disclaimer
Warning Warning of nature of content
Source Device used to create the image
Comment Miscellaneous comment

Other keywords may be defined for other purposes. Keywords of general interest can be registered with the PNG Registration Authority. It is also permitted to use private unregistered keywords.

@piponazo piponazo added this to the v1.00 milestone Jan 7, 2022
@piponazo piponazo reopened this Jan 7, 2022
@svgeesus
Copy link

svgeesus commented Nov 7, 2023

11.3.4.2 Keywords and text strings
The following keywords are predefined and should be used where appropriate

There is a slightly updated list in the official PNG Third Edition spec which adds XML:com.adobe.xmp. This is widely deployed.

In addition the PNG Extensions spec, v1.6.0 adds the Collection keyword. I haven't seen this one in the wild.

The eXIf Exchangeable Image File (Exif) Profile has also been moved from the extensions spec into PNG Third Edition, as it is widely used. One clarification was added: conforming to browser handling of eXIf in particular orientation: browsers only honor this if that chunk precedes IDAT while the extensions spec left it open where eXIf could go. PNG Third Edition requires that, like most other chunks apart from animation frames, it be before IDAT. See Chunk ordering

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement feature / functionality enhancements
Projects
Exiv2 1.00
Awaiting triage
Development

No branches or pull requests

7 participants