Skip to content

Improved charset handling in UserComment: what does it mean? #1258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
norbertj42 opened this issue Aug 3, 2020 · 34 comments
Closed

Improved charset handling in UserComment: what does it mean? #1258

norbertj42 opened this issue Aug 3, 2020 · 34 comments
Assignees
Milestone

Comments

@norbertj42
Copy link

One of the highlights of Exiv2 v0.27.3 is "Improved charset handling in UserComment". I scrolled through the change list, but did not find details to this item.
What I observed: in previous version toString returned the value including the leading "charset=...", whereas print() returned it without. Now in both cases I get the leading "charset=...".
Is this changed behaviour the improved handling or did something else change?
From my point of view the previous behaviour was the better one. print() returned a value, which is good for users which just want to see the value, whereas toString() gives the additional information about charset for users interested in charset.

@clanmills clanmills self-assigned this Aug 3, 2020
@clanmills clanmills added this to the v0.27.4 milestone Aug 3, 2020
@clanmills
Copy link
Collaborator

Good Question. I recommend that you read the discussion with Phil Harvey of ExifTool about this: #1046

I updated the man page exiv2.1 to explain how this is intended to work. Before v0.27.3 it was not documented.

I find charset (and all unicode/non-ascii) matters difficult to understand. I have a new contributor from China working with me at the moment on #1215 (currently PR #1257) and I have been thinking about asking ask him to review and test the Charset code after he completes his current assignment.

To address your concern about the API toString(), let me make a couple of observations:

1 In times gone by toString() could emit binary which was causing trouble in the test suite which uses the utility diff to compare files. Not only are arguments to diff platform dependent, binary diff on some platforms (Solaris I think) requires the utility bdiff. These horrors are being addressed in #1215.

2 You can get the "raw" value of stored bytes with another API. I think it's toValue(). I don't remember, however I believe this is discussed in #1046.

If you'd like to get involved in this, I would be delighted to accept your help.

@norbertj42
Copy link
Author

Thanks for the details and references. I have read them, but my C++ is too poor to understand it completely. Some time ago I also read the thread #662 "Incorrect Unicode encoding of Exif UserComment tag", because I noticed that often UserComment written by one program is not displayed correct by other programs, if the text contains German umlauts (ä. ö, ü). But using toString() and if needed converting enconding I was able to display UserComment in readable way from most of the images written by other meta data editors I tried.

Having a closer look now, I detected, that this "compatibilty" was lost when I migrated to 0.27.3, because in some cases I see now only "charset=Ascii binary comment". Based on the discussion in #1046, I will check, if the other programs write the tag according specification. Even if the error is on their side, it would be great to have exiv2 as tolerant as it was in 0.27.2.

I would like to help and give something back to this project, which enabled my program. But I have just basics in C++ and my environment is Windows and Microsoft Visual Studio. If this helps, please let me know.

@clanmills
Copy link
Collaborator

Well the list of things TODO never gets shorter. Not because I'm not making progress, it's because we live in an expanding universe!

Here's something you could explore with your tools: #1255 By default VCPKG builds 32-bit exiv2 and:
1 Doesn't build the sample applications
2 Doesn't copy the DLLs into build/bin

It would be good to figure out how to run the test suite on that - especially when Leo finishes #1215.

A matter closely related to #1255 is building libiconv on Windows. I see that vcpkg builds libiconv and links it into Exiv2. I'd like to test that with the UserComment. I recently added CMake code to guarantee that Exiv2/MSVC never links libiconv on 0.27-maintenance. I did that because a user kept linking Cygwin64/libiconv with Exiv2 and sending me bug reports.

My priority at the moment is to write a book Image Metadata and Exiv2 Architecture and then retire (I'll be 70 in January). I hope to give a talk about the book at LGM in Rennes in May 2021 and to run an afternoon workshop. Current draft: https://clanmills.com/book/exiv2

@tester0077
Copy link
Collaborator

tester0077 commented Aug 4, 2020 via email

@clanmills
Copy link
Collaborator

Thanks Arnold (@tester0077). I butchered cmake/FindIconv.cmake specially for you! I am curious about using libiconv on Windows - however I don't want to get distracted from working on ISOBMFF (CR3 and HEIF) for the book this week.

By the way, I researched and documented both PSD and IPTC last week. I documented the mysterious "Iptc.Envelope.CharacterSet" https://clanmills.com/exiv2/book/#IPTC

I will revisit UserComment for the book. I haven't spoken the Chinese Contributor yet about doing CharSet testing. I'll speak to him when/if he finishes his current assignment.

The book is of course more work than I planned or expected. Initially, I though "if I document TiffVisitor, that's enough.". That's done, and now I'm documenting every last nook and cranny. I've no idea how Exiv2 interacts with the Adobe XMPsdk and only a vague idea about preview images. Lots of work ahead.

@tester0077
Copy link
Collaborator

tester0077 commented Aug 4, 2020 via email

@clanmills
Copy link
Collaborator

clanmills commented Aug 4, 2020

I'm really good at bringing my projects in on schedule and to budget. And I seldom cut the spec. However I resist feature creep. In this case of the book, I shipped the v1 with v0.27.3. The book at that time was more-or-less what I had in mind to be the finished product. For LGM in Rennes and it'll be v5 and will document everything I know about Exiv2 and Metadata.

1157 rmills@rmillsmbp:~/gnu/github/exiv2/0.27-maintenance $ exiv2 -pa --grep artist/i test/data/Reagan.jpg 
Exif.Image.Artist                            Ascii      34  Photographerís Mate 3rd Class (A
1158 rmills@rmillsmbp:~/gnu/github/exiv2/0.27-maintenance $ 

I have the hi-res original of that in my Wallpaper's folder. I found it on the US Navy Website. Beautiful Photo. Using my utility dmpf.cpp (from the book) and tvisitor.cpp, here's what's in the file:

678 rmills@rmillsmm-local:~/gnu/exiv2/team/book/build $ ./dmpf '/Users/rmills/Google Drive/Wallpapers/USS Ronald Reagan.jpg' skip=$((30+654)) count=100 width=20
   0x2ac      684: Photographer..s Mate  ->  50 68 6f 74 6f 67 72 61 70 68 65 72 c3 ad 73 20 4d 61 74 65
   0x2c0      704:  3rd Class (A__&.._.  ->  20 33 72 64 20 43 6c 61 73 73 20 28 41 00 00 26 82 9a 00 05
   0x2d4      724: ___.__.~.._.___.__..  ->  00 00 00 01 00 00 04 7e 82 9d 00 05 00 00 00 01 00 00 04 86
   0x2e8      744: ."_.___._.__.__.___.  ->  88 22 00 03 00 00 00 01 00 01 00 00 90 00 00 07 00 00 00 04
   0x2fc      764: 0220.._.___.__...._.  ->  30 32 32 30 90 03 00 02 00 00 00 14 00 00 04 8e 90 04 00 02
679 rmills@rmillsmm-local:~/gnu/exiv2/team/book/build $ 

Somebody has used c3 ad as an apostrophe. I suspect "Adobe PhotoShop CS Macintosh".

I installed vcpkg by downloading the code from GitHub and building it myself. Visual Studio didn't seem to be impacted. Perhaps you installed it in a different way, or you've uncovered something I haven't noticied as I hardly ever use use Visual Studio these days. I used it all-day/every-day at Adobe for 10 years.

Although I've documented "Iptc.Envelope.CharacterSet", I don't think we should mess with that. However using charset encoding in UserComment is important and I'd like to be certain that it really works. I've tested it with UNICODE on Windows. However, it needs a serious workout by a native Nippon, Hindu, Mandarin or Arabic speaker.

@norbertj42
Copy link
Author

I have now checked the image, which gives "charset=Ascii binary comment" as UserComment. The problem: the text contains German umlauts (ä, ö, ü) - and they are no Ascii characters. So the value does not fit to the charset.
Then I made a test with exiv2 0.27.3. I was able to write Usercomment with "charset=Ascii comment äöü". When reading with 0.27.3, it gave binary comment, whereas exiv2 0.27.2 returned the value with wrong representation of the non-Ascii-characters.
So for command line the new behaviour might be ok, but I still think it is a step backward for those using the library in a GUI, where the non-Ascii-characters can be displayed correct.

As I already mentioned before, I personally would prefer to have the behaviour of 0.27.2 back (not returning binary comment, interpreted value without leading charset information). But as I did not understand the details about #1046, I am not able to judge, if going back is a good solution. Anyhow, as exiv2 is Open Source, I am able to adjust the code for my needs.

An idea: makes is sense to reject writing charset=Ascii with non-Ascii-characters? This would ensure that exiv2 does not violate the specification and it is somehow strange, that exiv2 is able to write something, which it cannot read. As opinions my differ about this idea, the check could be disabled by a #define.

@tester0077
Copy link
Collaborator

tester0077 commented Aug 5, 2020 via email

@clanmills
Copy link
Collaborator

Gentlemen. My understanding is that you can put any sequence of 8byte values into an Exif ascii tag. I believe the standard says that the ascii type is intended for 7-bit ascii values (32-127) and the should be nul terminated. (The nul is included in the count).

In this test file, the mysterious 2-byte encoding of the apostrophe came with the file from the internet.

1193 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ exiv2 -pa --grep Artist ~/gnu/github/exiv2/0.27-maintenance/test/data/Reagan.tiff 
Exif.Image.Artist                            Ascii      34  Photographerís Mate 3rd Class (A
1194 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ exiv2 -pa --grep Artist ~/gnu/github/exiv2/0.27-maintenance/test/data/Reagan.tiff  | od -a 
0000000    E   x   i   f   .   I   m   a   g   e   .   A   r   t   i   s
0000020    t  sp  sp  sp  sp  sp  sp  sp  sp  sp  sp  sp  sp  sp  sp  sp
0000040   sp  sp  sp  sp  sp  sp  sp  sp  sp  sp  sp  sp  sp   A   s   c
0000060    i   i  sp  sp  sp  sp  sp  sp   3   4  sp  sp   P   h   o   t
0000100    o   g   r   a   p   h   e   r   ?   ?   s  sp   M   a   t   e
0000120   sp   3   r   d  sp   C   l   a   s   s  sp   (   A  nl        

Exiv2 is a metadata tool and not a metadata policemen. Exiv2 will not prevent you from putting other bytes into an ascii string. For that matter, you could define the tag "Exif.Image.Artist" to have Rational or other values:

$ exiv2 -M'set Exif.Image.Artist Rational 3/2 1/2' Reagan.tiff 
1206 rmills@rmillsmbp:~/temp $ exiv2 -pa --grep Artist Reagan.tiff 
Exif.Image.Artist                            Rational    2  3/2 1/2
$

UserComment is defined to have a Charset and a binary stream. The use of this feature is documented in the man page exiv2.1 which shipped with Exiv2 v0.27.3

@norbertj42 has raised the topic of interoperability with other applications. My first concern is to correctly implement the standard. If other applications are in conflict with the Exiv2 implementation, I am willing to investigate ways to accommodate them. However before dealing with interoperability, I would like the Exiv2 implementation to be tested by a native speaker of a language which requires UNICODE or other charset support.

@clanmills
Copy link
Collaborator

Guys: I'm out of sync. I've seen your messages in the wrong order. I'll investigate tomorrow.

@tester0077
Copy link
Collaborator

@clanmills
0x2ac 684: Photographer..s Mate -> 50 68 6f 74 6f 67 72 61 70 68 65 72 c3 ad 73 20 4d 61 74 65
===
FWIW, These 2 bytes are the encoding for the UTF-8 character sequence for 0xC3 0xAF.
See: https://www.utf8-chartable.de/ , which corresponds to the character of an i with accent aigu.

@norbertj42 can you share a test image with the UserComment and what string you expect to find in that field, as well as which app placed the text there?
I have been looking for a usable and trustworthy test image for the user comment, but have had no luck

@clanmills
Copy link
Collaborator

Reagan.tiff came from the internet with the UTF-8 data which is being correctly reported by Exiv2. I don't see any case to discuss here.

The specification of UserComment is a very different . It supports CharSet. I believe exiftool also supports CharSet. Can somebody compare the two, please? And I can't simply restore the previous behaviour because that would break the fix to #1046.

@norbertj42
Copy link
Author

norbertj42 commented Aug 6, 2020

@tester0077
I am not using exiv2lib directly but build an own DLL using the exiv2 sources and an additional module serving as interface to my GUI which is written in C#. With this approach I had no problems with Umlauts until 0.27.3, when they - in combination with charset=Ascii - resulted in binary comment.

@clanmills
Attached test files I used to compare the behaviour. The Filename indicates with which program it was written. The text written always starts with "äöüßÄÖÜ" followed by a text indicating the input field of the respective tool (e.g. Exif-UserComment). I used:

Windows
Exif Pilot 5.14.0 writing Ascii
ExifToolGUI 5.16 with ExifTool 12.01 writing Unicode
MetaEditor 3.3.4.0 writing Unicode
Metalith 10.07 writing Ascii

Android
Exif Pro 0.0.9 writing Unicode

UserComment-images.zip

@clanmills
Copy link
Collaborator

clanmills commented Aug 6, 2020

Thanks for posting this, @norbertj42. Are we discussing only UserComment? You mentioned an API which has changed with 0.27.3. Which one toString() or print()? Do you know the old and new behaviour?

@tester0077 I'm not sure it's helpful to bring Exif.Image.Artist into this discussion. I've checked the Exif specification for ASCII and it says. (page 14 of the 2-2 spec):

2 = ASCII An 8-bit byte containing one 7-bit ASCII code. The final byte is terminated with NULL.

Exiv2 is not enforcing the '7-bit' ASCII code. If you want that changed, can you open a new issue.

@clanmills
Copy link
Collaborator

clanmills commented Aug 6, 2020

Thanks for your files.

Here's what I can see with Exiv2:

1245 rmills@rmillsmbp:~/Downloads/UserComment-images $ exiv2 -g UserComment -g Artist *.jpg
ExifPilot.jpg     Exif.Image.Artist             Ascii      26  äöüßÄÖÜ Exif-Artist
ExifPilot.jpg     Exif.Photo.UserComment        Undefined  35  charset=Ascii binary comment
ExifPro.jpg       Exif.Image.Artist             Ascii      27  äöüßÄÖÜ Exif-Artist
ExifPro.jpg       Exif.Photo.UserComment        Undefined  56  charset=Unicode äöüßÄÖÜ Exif-Usercomment
ExifToolGUI.jpg   Exif.Image.Artist             Ascii      25  äöüÄÖÜ Exif-Artist
ExifToolGUI.jpg   Exif.Photo.UserComment        Undefined  56  charset=Unicode äöüßÄÖÜ Exif-UserComment
MetaEditor.jpg    Exif.Image.Artist             Ascii      26  ??????? Exif Image.Artist
MetaEditor.jpg    Exif.Photo.UserComment        Undefined  56  charset=Unicode äöüßÄÖÜ Exif-UserComment
Metalith.jpg      Exif.Photo.UserComment        Undefined  32  charset=Ascii binary comment
1246 rmills@rmillsmbp:~/Downloads/UserComment-images $ 

And here's what I can see with tvisitor (the program in my book: https://clanmills.com/exiv2/book).

STRUCTURE OF JPEG FILE (II): /Users/rmills/Downloads/UserComment-images/ExifPilot.jpg
   address |    tag                              |      type |    count |    offset | value
        10 | 0x013b Exif.Image.Artist            |     ASCII |       26 |        38 | .............. Exif-Artist
        66 | 0x9286 Exif.Photo.UserComment       | UNDEFINED |       35 |        82 | ASCII___.............. Exif-Comment
END: /Users/rmills/Downloads/UserComment-images/ExifPilot.jpg

STRUCTURE OF JPEG FILE (II): /Users/rmills/Downloads/UserComment-images/ExifPro.jpg
        46 | 0x013b Exif.Image.Artist            |     ASCII |       27 |       102 | äöüßÄÖÜ Exif-Artist
       156 | 0x9286 Exif.Photo.UserComment       | UNDEFINED |       56 |       196 | UNICODE__._._._._._._._ _E_x_i_f_-_U +++
END: /Users/rmills/Downloads/UserComment-images/ExifPro.jpg

STRUCTURE OF JPEG FILE (II): /Users/rmills/Downloads/UserComment-images/ExifToolGUI.jpg
        46 | 0x013b Exif.Image.Artist            |     ASCII |       25 |       114 | äöüÄÖÜ Exif-Artist
       208 | 0x9286 Exif.Photo.UserComment       | UNDEFINED |       56 |       248 | UNICODE__._._._._._._._ _E_x_i_f_-_U +++
END: /Users/rmills/Downloads/UserComment-images/ExifToolGUI.jpg

STRUCTURE OF JPEG FILE (II): /Users/rmills/Downloads/UserComment-images/MetaEditor.jpg
        10 | 0x013b Exif.Image.Artist            |     ASCII |       26 |      2122 | ‰ˆ¸flƒ÷‹ Exif Image.Artist
      2180 | 0x9286 Exif.Photo.UserComment       | UNDEFINED |       56 |      4268 | UNICODE__._._._._._._._ _E_x_i_f_-_U +++
END: /Users/rmills/Downloads/UserComment-images/MetaEditor.jpg

STRUCTURE OF JPEG FILE (II): /Users/rmills/Downloads/UserComment-images/Metalith.jpg
        40 | 0x9286 Exif.Photo.UserComment       | UNDEFINED |       32 |        84 | ASCII___....... Exif-UserComment
END: /Users/rmills/Downloads/UserComment-images/Metalith.jpg

Let's keep working on this until I reach the "Ah, I see what you're talking about" moment. For sure, I'm currently totally lost.

@clanmills
Copy link
Collaborator

I have built and installed exiv2 v0.27.2 on my machine. And now I see:

1301 rmills@rmillsmbp:~/Downloads/UserComment-images $ exiv2 -g UserComment -g Artist *.jpg
ExifPilot.jpg         Exif.Image.Artist                            Ascii      26  äöüßÄÖÜ Exif-Artist
ExifPilot.jpg         Exif.Photo.UserComment                       Undefined  35  äöüßÄÖÜ Exif-Comment
ExifPro.jpg           Exif.Image.Artist                            Ascii      27  äöüßÄÖÜ Exif-Artist
ExifPro.jpg           Exif.Photo.UserComment                       Undefined  56  äöüßÄÖÜ Exif-Usercomment
ExifToolGUI.jpg       Exif.Image.Artist                            Ascii      25  äöüÄÖÜ Exif-Artist
ExifToolGUI.jpg       Exif.Photo.UserComment                       Undefined  56  äöüßÄÖÜ Exif-UserComment
MetaEditor.jpg        Exif.Image.Artist                            Ascii      26  ??????? Exif Image.Artist
MetaEditor.jpg        Exif.Photo.UserComment                       Undefined  56  äöüßÄÖÜ Exif-UserComment
Metalith.jpg          Exif.Photo.UserComment                       Undefined  32  ??????? Exif-UserComment
1302 rmills@rmillsmbp:~/Downloads/UserComment-images $ 

So Exif.Image.Artist v0.27.3 and v0.27.2 are identical at:

1245 rmills@rmillsmbp:~/Downloads/UserComment-images $ exiv2 -g UserComment -g Artist *.jpg
ExifPilot.jpg     Exif.Image.Artist             Ascii      26  äöüßÄÖÜ Exif-Artist
ExifPro.jpg       Exif.Image.Artist             Ascii      27  äöüßÄÖÜ Exif-Artist
ExifToolGUI.jpg   Exif.Image.Artist             Ascii      25  äöüÄÖÜ Exif-Artist
MetaEditor.jpg    Exif.Image.Artist             Ascii      26  ??????? Exif Image.Artist
1246 rmills@rmillsmbp:~/Downloads/UserComment-images $ 

So, we're focused now on Exif.Photo.UserComment which in v0.27.2 reported:

ExifPilot.jpg         Exif.Photo.UserComment                       Undefined  35  äöüßÄÖÜ Exif-Comment
ExifPro.jpg           Exif.Photo.UserComment                       Undefined  56  äöüßÄÖÜ Exif-Usercomment
ExifToolGUI.jpg       Exif.Photo.UserComment                       Undefined  56  äöüßÄÖÜ Exif-UserComment
MetaEditor.jpg        Exif.Photo.UserComment                       Undefined  56  äöüßÄÖÜ Exif-UserComment
Metalith.jpg          Exif.Photo.UserComment                       Undefined  32  ??????? Exif-UserComment

And is now (in 0.27.3) reporting:

ExifPilot.jpg     Exif.Photo.UserComment        Undefined  35  charset=Ascii binary comment
ExifPro.jpg       Exif.Photo.UserComment        Undefined  56  charset=Unicode äöüßÄÖÜ Exif-Usercomment
ExifToolGUI.jpg   Exif.Photo.UserComment        Undefined  56  charset=Unicode äöüßÄÖÜ Exif-UserComment
MetaEditor.jpg    Exif.Photo.UserComment        Undefined  56  charset=Unicode äöüßÄÖÜ Exif-UserComment
Metalith.jpg      Exif.Photo.UserComment        Undefined  32  charset=Ascii binary comment

I will investigate. You've mentioned that this change of behaviour appears to come from the API toString(). I will investigate and get back to you later today.

@tester0077 I know you're concerned about your umlauts and so you should be! Please open a new issue to discuss Exif.Image.Artist. My current thought are:

  1. It's been this way since Andreas brought the tablets of Exiv2 down from the Mountain Top before I joined the project in 2008.
  2. The UTF-8 code in Reagan.tiff was put there by another application. For sure the file was written by PhotoShop. It's not a good idea to refuse to read files written by PhotoShop. We'll silently preserve that data.
  3. When you set an ascii tag in Exiv2, we should throw for non 7-bit characters.
  4. We continue to allow the user to set any value such as Rational in the metadata.

@clanmills
Copy link
Collaborator

This has been caused by the following code in v0.27.3 in src/value.cpp

    std::string CommentValue::comment(const char* encoding) const
    {
        std::string c;
        if (value_.length() < 8) {
            return c;
        }
        c = value_.substr(8);
        if (charsetId() == unicode) {
            const char* from = encoding == 0 || *encoding == '\0' ? detectCharset(c) : encoding;
            convertStringCharset(c, from, "UTF-8");
        } else {
            // charset=undefined reports "binary comment" if it contains non-printable bytes
            // this is to ensure no binary bytes in the output stream.
            if ( isBinary(c) ) {
                c = "binary comment" ;
            }
        }
        return c;
    }

The previous code was:

    std::string CommentValue::comment(const char* encoding) const
    {
        std::string c;
        if (value_.length() < 8) {
            return c;
        }
        c = value_.substr(8);
        if (charsetId() == unicode) {
            const char* from = encoding == 0 || *encoding == '\0' ? detectCharset(c) : encoding;
            convertStringCharset(c, from, "UTF-8");
        }
        return c;
    }

Restoring the old code has two consequences:

1 Binary can get into the output (which causes platform issues with the test suite).
2 I've changed lots of reference files in test/data/*.out with the 'binary comment'

As you have your own copy of the code, you already have a work-around. I appreciate that such as work-around has a maintenance "hit" as you will have to remember to patch this into future versions of exiv2. You can of course solve that in your code by detecting that "binary comment" is in the output and take evasive action.

Because Leo is currently working on porting the bash scripts to python, the horrors of binary output are likely to disappear and we can revisit this matter. So, I propose to leave this open for the moment.

I believe I've explained everything and would welcome your feedback.

@norbertj42
Copy link
Author

I had a look in the code and made the change you suggested - and it works fine for me, thanks. So this code change solved the problem with binary output.
The other change in behaviour is visible in your output: output from 0.27.3 has leading "charset=...", which was not with 0.27.2. That is the point where I referred to print(). In 0.27.2 print() returned value without leading charset information, whereas toString() did. Now the print() in 0.27.3 also has the charset information. I tried to find where it comes from, but failed so far. For me it looks not to be related to the binary problem, but might be side effect - or is it a change wanted by somebody? Anyhow it is easy to remove that prefix before I show the value in my GUI.

@clanmills
Copy link
Collaborator

Ah, yes. I meant to mention that. It reports charset=Ascii to mirror the syntax required to set the charset. This is used both on the command and in the API. For example:

$ curl -OL http://clanmills.com/Stonehenge.jpg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 6599k  100 6599k    0     0  2420k      0  0:00:02  0:00:02 --:--:-- 2419k
$ exiv2 --version | head -1
exiv2 0.27.3
$ exiv2 -M'set Exif.Photo.UserComment charset=Unicode Robin Mills' Stonehenge.jpg 
$ exiv2 -g UserComment Stonehenge.jpg 
Exif.Photo.UserComment   Undefined  30  charset=Unicode Robin Mills     # 30 = 8+2*11
$ exiv2 -M'set Exif.Photo.UserComment charset=Ascii Robin Mills' Stonehenge.jpg 
$ exiv2 -g UserComment Stonehenge.jpg 
Exif.Photo.UserComment   Undefined  19  charset=Ascii Robin Mills        # 19 = 8+11
$ 

By the way, I thinks it's a good idea for you to put something in your code to parse the value from toString() as that will make your code resilient to further changes I might make. For example, if I totally restore the <= 0.27.2 behaviour, I don't want to break your code! The syntax is [charset=blob] value-string and value-string can 'binary comment'.

Here's the description from exiv2.1 the man page:

              The  format  of  Exif Comment values include an optional charset
	      specification at the beginning.  Comments are used by  the  tags
	      Exif.Photo.UserComment,	Exif.GPSInfo.GPSProcessingMethod   and
	      Exif.GPSInfo.GPSAreaInformation.	Comments are stored  as  Unde-
	      fined  tags  with  an  8	byte encoding definition follow by the
	      encoded data. The charset is specified as follows:

	      [charset=Ascii|Jis|Unicode|Undefined] comment
	      charset=Undefined is the default

	      $ exiv2 -M'set Exif.Photo.UserComment charset=Ascii My photo' x.jpg
	      $ exiv2 -pa --grep UserComment x.jpg
	      Exif.Photo.UserComment	     Undefined	16  My photo
	      $ exiv2 -pv --grep UserComment x.jpg
	      0x9286 Photo	 UserComment Undefined	16  charset=Ascii My photo

	      $ exiv2 -M'set Exif.Photo.UserComment charset=Unicode \u0052\u006f\u0062\u0069\u006e' x.jpg
	      $ exiv2 -pa --grep UserComment x.jpg
	      Exif.Photo.UserComment			   Undefined  18  Robin
	      $ exiv2 -pv --grep UserComment x.jpg
	      0x9286 Photo	  UserComment		      Undefined  18  charset=Unicode Robin

	      $ exiv2 -M'set Exif.GPSInfo.GPSProcessingMethod HYBRID-FIX' x.jpg
	      $ exiv2 -pa --grep ProcessingMethod	 x.jpg
	      Exif.GPSInfo.GPSProcessingMethod		   Undefined  18  HYBRID-FIX
	      $ exiv2 -pv --grep ProcessingMethod	 x.jpg
	      0x001b GPSInfo	 GPSProcessingMethod	   Undefined  18  HYBRID-FIX

If you're happy, could you close this issue, please. I intend to ask Leo to work on UserComment if/when he finishes #1215. He might decide he's happy with binary output from the exiv2 command and decide to restore printing the bytes. I will advocate to retain charset=Encoding.

I'm very happy that you have raised this subject as it's something that should be documented in my book. It's been good to revisit this subject and refresh my (old and getting older) brain about charset.

Arnold and I have been working together for years and I have no doubt he'll challenge me about Exif.Image.Artist with his usual enthusiasm.

@norbertj42
Copy link
Author

@clanmills Thank your for your support; I close the issue now and hope that Leo will find a solution working with binary output.
@tester0077 Thanks for your contribution to the discussion.

@clanmills
Copy link
Collaborator

@norbertj42 I'm sure Leo will deal with binary output from exiv2 and we will restore outputting the string while retaining the charset=Encoding.

As I haven't worked with Leo before, it's hard to know when (or if) he will finish his assignment.

@tester0077
Copy link
Collaborator

tester0077 commented Aug 6, 2020 via email

@tester0077
Copy link
Collaborator

@norbertj42 You might want to check out the utility WPMeta at
http://www.pilwousek.de/WPSoft/

@clanmills
Copy link
Collaborator

In the process of working on this, I discovered an issue with setting GPSProcessingMethod in samples/geotag.cpp. I've submitted a fix: #1268

The user who opened #1046 (@nicofooo) has raised another matter relating to this @1266. I've added code to strip trailing nuls on comments. #1067.

@clanmills
Copy link
Collaborator

I have wondering if the charset=Unicode Chinese support works adequately well. The following comment by @LeoHsiao1 confirms that it is satisfactory. #1279 (comment)

Very pleased to have @LeoHsiao1 working with us.

@vinc17fr
Copy link

vinc17fr commented Oct 9, 2020

I don't know whether this is related, but the new behavior of libexiv2 0.27.3 makes gthumb display "charset=Ascii" before the comment. This is bad.

@clanmills
Copy link
Collaborator

This behaviour has been changed for good reason. How difficult is it for gthumb to detect and ignore the "charset=xxxx " prolog?

@vinc17fr
Copy link

vinc17fr commented Oct 9, 2020

I don't know, but one issue is that this wasn't announced, so that there was no chance to warn developers and block the upgrade until the applications have been updated. The consequence is that a bug suddenly appeared in gthumb (and other applications, I assume).

@clanmills
Copy link
Collaborator

There were 2 release candidates on 2020-04-30 and 2020-05-31 before v0.27.3 shipped on 2020-06-30. The release candidates were announced on Facebook and the forum https://discuss.pixls.us There was an opportunity to provide me feedback.

Please understand that I'm working on my own. I don't have any global view of who and why people use Exiv2. I always do my best. I don't always succeed. This doesn't feel very important or painful to me.

@vinc17fr
Copy link

Note that I'm just an end user of gthumb via a binary distribution, so I obviously couldn't check. I don't know why gthumb developers did not notice the issue. I have just reported a bug in its BTS: https://gitlab.gnome.org/GNOME/gthumb/-/issues/137

@clanmills
Copy link
Collaborator

Thank You, @vinc17fr

I notified the community in mid-March of my plan to release Exiv2 v0.27.3. The primary motivation for v0.27.3 concerned charset= handling. The proposal was executed as defined. Release candidates were provided as scheduled.

I'm reluctant to revert the 0.27.3 behaviour as this may cause an avalanche of further criticism.

Exiv2 is more-or-less a one man project. I cannot know the impact of every change on every user. For example, I've never heard of gthumb.

I understand that you are upset by this change. I am interested to know if the gthumb engineers tested their code with the release candidates for Exiv2 v0.27.3.

@vinc17fr
Copy link

FYI, it is also buggy in nomacs (via Panels → Metadata Info, then Exif → Photo), so that there may be something wrong concerning the communication. However, this is much less an issue in nomacs than in gThumb, since contrary to gThumb, nomacs primarily uses "Image Description" rather than "User Comment".

Well, the good point about this is that I've found a better image viewer than gThumb.

@clanmills
Copy link
Collaborator

What do you want from me.? We have discussed this and there is nothing more to be said. Please leave me alone.

novomesk pushed a commit to nomacs/nomacs that referenced this issue Mar 22, 2024
…1049)

Converting the UserComment exif metadatum to string will result in its
direct/quasi-internal string representation of libexiv2, which may
include a "charset=..." prefix with the charset of the value.

Since we want the actual content/value of UserComment, and the
Exiv2::Value held by Exiv2::Exifdatum is Exiv2::CommentValue, then
cast it to call comment(). The result is converted to QString using
QString::fromStdString(), which converts std::string as UTF-8 string.

For further details, see also the exiv2 ticket:
Exiv2/exiv2#1258

Signed-off-by: Pino Toscano <toscano.pino@tiscali.it>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants