-
Notifications
You must be signed in to change notification settings - Fork 290
Improved charset handling in UserComment: what does it mean? #1258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Good Question. I recommend that you read the discussion with Phil Harvey of ExifTool about this: #1046 I updated the man page exiv2.1 to explain how this is intended to work. Before v0.27.3 it was not documented. I find charset (and all unicode/non-ascii) matters difficult to understand. I have a new contributor from China working with me at the moment on #1215 (currently PR #1257) and I have been thinking about asking ask him to review and test the Charset code after he completes his current assignment. To address your concern about the API 1 In times gone by 2 You can get the "raw" value of stored bytes with another API. I think it's If you'd like to get involved in this, I would be delighted to accept your help. |
Thanks for the details and references. I have read them, but my C++ is too poor to understand it completely. Some time ago I also read the thread #662 "Incorrect Unicode encoding of Exif UserComment tag", because I noticed that often UserComment written by one program is not displayed correct by other programs, if the text contains German umlauts (ä. ö, ü). But using toString() and if needed converting enconding I was able to display UserComment in readable way from most of the images written by other meta data editors I tried. Having a closer look now, I detected, that this "compatibilty" was lost when I migrated to 0.27.3, because in some cases I see now only "charset=Ascii binary comment". Based on the discussion in #1046, I will check, if the other programs write the tag according specification. Even if the error is on their side, it would be great to have exiv2 as tolerant as it was in 0.27.2. I would like to help and give something back to this project, which enabled my program. But I have just basics in C++ and my environment is Windows and Microsoft Visual Studio. If this helps, please let me know. |
Well the list of things TODO never gets shorter. Not because I'm not making progress, it's because we live in an expanding universe! Here's something you could explore with your tools: #1255 By default VCPKG builds 32-bit exiv2 and: It would be good to figure out how to run the test suite on that - especially when Leo finishes #1215. A matter closely related to #1255 is building libiconv on Windows. I see that vcpkg builds libiconv and links it into Exiv2. I'd like to test that with the UserComment. I recently added CMake code to guarantee that Exiv2/MSVC never links libiconv on 0.27-maintenance. I did that because a user kept linking Cygwin64/libiconv with Exiv2 and sending me bug reports. My priority at the moment is to write a book Image Metadata and Exiv2 Architecture and then retire (I'll be 70 in January). I hope to give a talk about the book at LGM in Rennes in May 2021 and to run an afternoon workshop. Current draft: https://clanmills.com/book/exiv2 |
Hi all,
this issue is very dear to my heart as well as I also have to deal with
metadata text string in other than plain ASCII.
I my case, these strings were not part of UserComment, but rather of
some of the other metadata fields.
As it turns out, I am likely the guy who Robin is thinking of & who sent
him all those bug reports encountered during my 32-bit Windows build of
Exiv2lib for my app.
Some of these build problems were resolved by my using vcpkg to install
some of the dependencies, but I have meanwhile backed away from vcpkg
until I can sort out some of the issues I have run into in my setup.
Vcpkg, when installed the default way, takes over all instances of all
IDEs. For my environment, that is a problem & I have contacted MS, but
have not had any resolution yet.
Still, I have built a 32-bit exiv2lib for my app and if necessary, I
might be able to help with getting it compiled for someone else.
If there are any images with such problem string, I would not mind
testing them on my app - mainly intended to be able to compare the
metadata in two images. It is nowhere near finished, but the basic
reading and display of the metadata works for my example files.
Arnold
…On 2020-08-04 3:53 AM, Robin Mills wrote:
Well the list of things TODO never gets shorter. Not because I'm not
making progress, it's because we live in an expanding universe!
Here's something you could explore with your tools: #1255
<#1255> By default VCPKG builds
32-bit exiv2 and:
1 Doesn't build the sample applications
2 Doesn't copy the DLLs into build/bin
It would be good to figure out how to run the test suite on that -
especially when Leo finishes #1215
<#1215>.
A matter closely related to #1255
<#1255> is building libiconv on
Windows. I see that vcpkg builds libiconv and links it into Exiv2. I'd
like to test that with the UserComment. I recently added CMake code to
guarantee that Exiv2/MSVC never links libiconv on 0.27-maintenance. I
did that because a user kept linking Cygwin64/libiconv with Exiv2 and
sending me bug reports.
My priority at the moment is to write a book *Image Metadata /and/
Exiv2 Architecture* and then retire (I'll be 70 in January). I hope to
give a talk about the book at LGM in Rennes in May 2021 and to run an
afternoon workshop. Current draft: https://clanmills.com/book/exiv2
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1258 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACFCLPCR2TET4BKGOHSDKQLR67SBFANCNFSM4PTLERJQ>.
|
Thanks Arnold (@tester0077). I butchered cmake/FindIconv.cmake specially for you! I am curious about using libiconv on Windows - however I don't want to get distracted from working on ISOBMFF (CR3 and HEIF) for the book this week. By the way, I researched and documented both PSD and IPTC last week. I documented the mysterious "Iptc.Envelope.CharacterSet" https://clanmills.com/exiv2/book/#IPTC I will revisit UserComment for the book. I haven't spoken the Chinese Contributor yet about doing CharSet testing. I'll speak to him when/if he finishes his current assignment. The book is of course more work than I planned or expected. Initially, I though "if I document TiffVisitor, that's enough.". That's done, and now I'm documenting every last nook and cranny. I've no idea how Exiv2 interacts with the Adobe XMPsdk and only a vague idea about preview images. Lots of work ahead. |
Hi Robin
On 2020-08-04 9:33 AM, Robin Mills wrote:
Thanks Arnold ***@***.*** <https://github.com/tester0077>). I
butchered cmake/FindIconv.cmake specially for you! I am curious about
using libiconv on Windows - however I don't want to get distracted
from working on ISOBMFF (CR3 and HEIF) for the book this week.
Yes, there is some work to be done yet and I very much appreciate your
support to get my version to compile & link. As I mentioned, I am a bit
disappointed by vcpkg, at least in my environment. When it is installed
and hooked up to the MSVC IDE, it takes over all tool chains it knows
about, or so it seems. But when it is unhooked or uninstalled, to does
not restore the state to anything like it was before and since it seems
to do all its work 'under the covers', it takes some work to restore a
sane status after it is out of the loop.
By the way, I researched and documented both PSD and IPTC last week. I
documented the mysterious "Iptc.Envelope.CharacterSet"
https://clanmills.com/exiv2/book/#IPTC
Those character sets are quite a handful. My understanding is that there
are code sequences which allow shifting back and forth between character
sets within one image.
The biggest problem for me is to find images with these various
character sets for testing and checking the values extracted.
FWIW, an image I believe I found on the Exiv2 site of the
USSRonaldReagan has, what I believe to be a bad Exif string for the
'Artist' caption. No idea how it was entered, but it can't be right.
The book is of course more work than I planned or expected.
It is always so ;-)
|
I'm really good at bringing my projects in on schedule and to budget. And I seldom cut the spec. However I resist feature creep. In this case of the book, I shipped the v1 with v0.27.3. The book at that time was more-or-less what I had in mind to be the finished product. For LGM in Rennes and it'll be v5 and will document everything I know about Exiv2 and Metadata.
I have the hi-res original of that in my Wallpaper's folder. I found it on the US Navy Website. Beautiful Photo. Using my utility dmpf.cpp (from the book) and tvisitor.cpp, here's what's in the file:
Somebody has used I installed vcpkg by downloading the code from GitHub and building it myself. Visual Studio didn't seem to be impacted. Perhaps you installed it in a different way, or you've uncovered something I haven't noticied as I hardly ever use use Visual Studio these days. I used it all-day/every-day at Adobe for 10 years. Although I've documented "Iptc.Envelope.CharacterSet", I don't think we should mess with that. However using charset encoding in UserComment is important and I'd like to be certain that it really works. I've tested it with UNICODE on Windows. However, it needs a serious workout by a native Nippon, Hindu, Mandarin or Arabic speaker. |
I have now checked the image, which gives "charset=Ascii binary comment" as UserComment. The problem: the text contains German umlauts (ä, ö, ü) - and they are no Ascii characters. So the value does not fit to the charset. As I already mentioned before, I personally would prefer to have the behaviour of 0.27.2 back (not returning binary comment, interpreted value without leading charset information). But as I did not understand the details about #1046, I am not able to judge, if going back is a good solution. Anyhow, as exiv2 is Open Source, I am able to adjust the code for my needs. An idea: makes is sense to reject writing charset=Ascii with non-Ascii-characters? This would ensure that exiv2 does not violate the specification and it is somehow strange, that exiv2 is able to write something, which it cannot read. As opinions my differ about this idea, the check could be disabled by a #define. |
Hi Norbert,
Looks like we have the same issue with Umlauts ;-)
Are you needing the command line version to handle these Umlauts or are
you using exiv2lib in a program?
I know the library can extract the string properly, though I have not
yet tried to use it to write anything to an image, nor have I used the
command line very much.
Arnold
…On 2020-08-05 11:06 AM, norbertj42 wrote:
I have now checked the image, which gives "charset=Ascii binary
comment" as UserComment. The problem: the text contains German umlauts
(ä, ö, ü) - and they are no Ascii characters. So the value does not
fit to the charset.
Then I made a test with exiv2 0.27.3. I was able to write Usercomment
with "charset=Ascii comment äöü". When reading with 0.27.3, it gave
binary comment, whereas exiv2 0.27.2 returned the value with wrong
representation of the non-Ascii-characters.
So for command line the new behaviour might be ok, but I still think
it is a step backward for those using the library in a GUI, where the
non-Ascii-characters can be displayed correct.
As I already mentioned before, I personally would prefer to have the
behaviour of 0.27.2 back (not returning binary comment, interpreted
value without leading charset information). But as I did not
understand the details about #1046
<#1046>, I am not able to judge,
if going back is a good solution. Anyhow, as exiv2 is Open Source, I
am able to adjust the code for my needs.
An idea: makes is sense to reject writing charset=Ascii with
non-Ascii-characters? This would ensure that exiv2 does not violate
the specification and it is somehow strange, that exiv2 is able to
write something, which it cannot read. As opinions my differ about
this idea, the check could be disabled by a #define.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1258 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACFCLPGCOFKR26TM6FLGCW3R7GNTPANCNFSM4PTLERJQ>.
|
Gentlemen. My understanding is that you can put any sequence of 8byte values into an Exif ascii tag. I believe the standard says that the ascii type is intended for 7-bit ascii values (32-127) and the should be nul terminated. (The nul is included in the count). In this test file, the mysterious 2-byte encoding of the apostrophe came with the file from the internet.
Exiv2 is a metadata tool and not a metadata policemen. Exiv2 will not prevent you from putting other bytes into an ascii string. For that matter, you could define the tag "Exif.Image.Artist" to have Rational or other values:
UserComment is defined to have a Charset and a binary stream. The use of this feature is documented in the man page exiv2.1 which shipped with Exiv2 v0.27.3 @norbertj42 has raised the topic of interoperability with other applications. My first concern is to correctly implement the standard. If other applications are in conflict with the Exiv2 implementation, I am willing to investigate ways to accommodate them. However before dealing with interoperability, I would like the Exiv2 implementation to be tested by a native speaker of a language which requires UNICODE or other charset support. |
Guys: I'm out of sync. I've seen your messages in the wrong order. I'll investigate tomorrow. |
@clanmills @norbertj42 can you share a test image with the UserComment and what string you expect to find in that field, as well as which app placed the text there? |
Reagan.tiff came from the internet with the UTF-8 data which is being correctly reported by Exiv2. I don't see any case to discuss here. The specification of UserComment is a very different . It supports CharSet. I believe exiftool also supports CharSet. Can somebody compare the two, please? And I can't simply restore the previous behaviour because that would break the fix to #1046. |
@tester0077 @clanmills Windows Android |
Thanks for posting this, @norbertj42. Are we discussing only UserComment? You mentioned an API which has changed with 0.27.3. Which one @tester0077 I'm not sure it's helpful to bring Exif.Image.Artist into this discussion. I've checked the Exif specification for ASCII and it says. (page 14 of the 2-2 spec):
Exiv2 is not enforcing the '7-bit' ASCII code. If you want that changed, can you open a new issue. |
Thanks for your files. Here's what I can see with Exiv2:
And here's what I can see with tvisitor (the program in my book: https://clanmills.com/exiv2/book).
Let's keep working on this until I reach the "Ah, I see what you're talking about" moment. For sure, I'm currently totally lost. |
I have built and installed exiv2 v0.27.2 on my machine. And now I see:
So Exif.Image.Artist v0.27.3 and v0.27.2 are identical at:
So, we're focused now on Exif.Photo.UserComment which in v0.27.2 reported:
And is now (in 0.27.3) reporting:
I will investigate. You've mentioned that this change of behaviour appears to come from the API @tester0077 I know you're concerned about your umlauts and so you should be! Please open a new issue to discuss Exif.Image.Artist. My current thought are:
|
This has been caused by the following code in v0.27.3 in src/value.cpp
The previous code was:
Restoring the old code has two consequences: 1 Binary can get into the output (which causes platform issues with the test suite). As you have your own copy of the code, you already have a work-around. I appreciate that such as work-around has a maintenance "hit" as you will have to remember to patch this into future versions of exiv2. You can of course solve that in your code by detecting that "binary comment" is in the output and take evasive action. Because Leo is currently working on porting the bash scripts to python, the horrors of binary output are likely to disappear and we can revisit this matter. So, I propose to leave this open for the moment. I believe I've explained everything and would welcome your feedback. |
I had a look in the code and made the change you suggested - and it works fine for me, thanks. So this code change solved the problem with binary output. |
Ah, yes. I meant to mention that. It reports
By the way, I thinks it's a good idea for you to put something in your code to parse the value from toString() as that will make your code resilient to further changes I might make. For example, if I totally restore the <= 0.27.2 behaviour, I don't want to break your code! The syntax is Here's the description from exiv2.1 the man page:
If you're happy, could you close this issue, please. I intend to ask Leo to work on UserComment if/when he finishes #1215. He might decide he's happy with binary output from the exiv2 command and decide to restore printing the bytes. I will advocate to retain charset=Encoding. I'm very happy that you have raised this subject as it's something that should be documented in my book. It's been good to revisit this subject and refresh my (old and getting older) brain about charset. Arnold and I have been working together for years and I have no doubt he'll challenge me about Exif.Image.Artist with his usual enthusiasm. |
@clanmills Thank your for your support; I close the issue now and hope that Leo will find a solution working with binary output. |
@norbertj42 I'm sure Leo will deal with binary output from exiv2 and we will restore outputting the string while retaining the charset=Encoding. As I haven't worked with Leo before, it's hard to know when (or if) he will finish his assignment. |
Glad you got it all sorted out.
I will have a closer look at the images to provided.
Arnold
PS: our e-mail system is being migrated and things are in a mess so I
will have to sort that out first.
…On 2020-08-06 7:44 AM, norbertj42 wrote:
@clanmills <https://github.com/clanmills> Thank your for your support;
I close the issue now and hope that Leo will find a solution working
with binary output.
@tester0077 <https://github.com/tester0077> Thanks for your
contribution to the discussion.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1258 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACFCLPEK6GPY7PZLW3UG5VLR7K6UDANCNFSM4PTLERJQ>.
|
@norbertj42 You might want to check out the utility WPMeta at |
I have wondering if the charset=Unicode Chinese support works adequately well. The following comment by @LeoHsiao1 confirms that it is satisfactory. #1279 (comment) Very pleased to have @LeoHsiao1 working with us. |
I don't know whether this is related, but the new behavior of libexiv2 0.27.3 makes gthumb display "charset=Ascii" before the comment. This is bad. |
This behaviour has been changed for good reason. How difficult is it for gthumb to detect and ignore the |
I don't know, but one issue is that this wasn't announced, so that there was no chance to warn developers and block the upgrade until the applications have been updated. The consequence is that a bug suddenly appeared in gthumb (and other applications, I assume). |
There were 2 release candidates on 2020-04-30 and 2020-05-31 before v0.27.3 shipped on 2020-06-30. The release candidates were announced on Facebook and the forum https://discuss.pixls.us There was an opportunity to provide me feedback. Please understand that I'm working on my own. I don't have any global view of who and why people use Exiv2. I always do my best. I don't always succeed. This doesn't feel very important or painful to me. |
Note that I'm just an end user of gthumb via a binary distribution, so I obviously couldn't check. I don't know why gthumb developers did not notice the issue. I have just reported a bug in its BTS: https://gitlab.gnome.org/GNOME/gthumb/-/issues/137 |
Thank You, @vinc17fr I notified the community in mid-March of my plan to release Exiv2 v0.27.3. The primary motivation for v0.27.3 concerned charset= handling. The proposal was executed as defined. Release candidates were provided as scheduled. I'm reluctant to revert the 0.27.3 behaviour as this may cause an avalanche of further criticism. Exiv2 is more-or-less a one man project. I cannot know the impact of every change on every user. For example, I've never heard of I understand that you are upset by this change. I am interested to know if the gthumb engineers tested their code with the release candidates for Exiv2 v0.27.3. |
FYI, it is also buggy in nomacs (via Panels → Metadata Info, then Exif → Photo), so that there may be something wrong concerning the communication. However, this is much less an issue in nomacs than in gThumb, since contrary to gThumb, nomacs primarily uses "Image Description" rather than "User Comment". Well, the good point about this is that I've found a better image viewer than gThumb. |
What do you want from me.? We have discussed this and there is nothing more to be said. Please leave me alone. |
…1049) Converting the UserComment exif metadatum to string will result in its direct/quasi-internal string representation of libexiv2, which may include a "charset=..." prefix with the charset of the value. Since we want the actual content/value of UserComment, and the Exiv2::Value held by Exiv2::Exifdatum is Exiv2::CommentValue, then cast it to call comment(). The result is converted to QString using QString::fromStdString(), which converts std::string as UTF-8 string. For further details, see also the exiv2 ticket: Exiv2/exiv2#1258 Signed-off-by: Pino Toscano <toscano.pino@tiscali.it>
One of the highlights of Exiv2 v0.27.3 is "Improved charset handling in UserComment". I scrolled through the change list, but did not find details to this item.
What I observed: in previous version toString returned the value including the leading "charset=...", whereas print() returned it without. Now in both cases I get the leading "charset=...".
Is this changed behaviour the improved handling or did something else change?
From my point of view the previous behaviour was the better one. print() returned a value, which is good for users which just want to see the value, whereas toString() gives the additional information about charset for users interested in charset.
The text was updated successfully, but these errors were encountered: