Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPTC tags with special characters such as ø, ç, õ, etc... #1203

Closed
markijsseldijk opened this issue May 9, 2020 · 4 comments
Closed

IPTC tags with special characters such as ø, ç, õ, etc... #1203

markijsseldijk opened this issue May 9, 2020 · 4 comments
Assignees
Milestone

Comments

@markijsseldijk
Copy link

I use exiv2 to tag multiple images in batch. When the tags contain special characters, after running the batch, the special characters in the tags are not saved correctly.

The batch command for 2 sample images;
exiv2 -M"set Iptc.Application2.Caption String "F-OJSE Airbus A330-202 510 AirCalin - Air Calédonie International"" D:\Temp\20B0004.jpg
exiv2 -M"set Iptc.Application2.Caption String "PR-RDD Gulfstream G550 5280 Yamandu Empreendimentos e Participações SA"" D:\Temp\20B0037.jpg

I have used a workaround as below but this is now also causing issues.

exiv2 -M"set Iptc.Application2.Caption String "F-OJSE Airbus A330-202 510 AirCalin - Air Cal\u00E9donie International"" D:\Temp\20B0004.jpg
exiv2 -M"set Iptc.Application2.Caption String "PR-RDD Gulfstream G550 5280 Yamandu Empreendimentos e Participa\u00E7\u00F5es SA"" D:\Temp\20B0037.jpg

Using exiftool, I can specify the character set as Latin as in below, and this allows the special character to be correctly written and subsequently displayed:

exiftool -charset latin

There is mention in the EXIV2 documentation about "-n" but I cannot seem to locate any samples of this or how to incorporate this encoding in the above command lines.

If someone is able to assist then it will be greatly appreciated.

@clanmills clanmills self-assigned this May 9, 2020
@clanmills clanmills added this to the v0.27.3 milestone May 9, 2020
@clanmills
Copy link
Collaborator

Thanks for raising this issue. I've never thought about this. In March, I investigated (and solved) using Unicode on Exif.Photo.UserComment. I'll dig around in the code this week-end and update you. For sure, I should add something to the man page about this.

@clanmills clanmills modified the milestones: v0.27.3, v0.28 May 9, 2020
@clanmills clanmills removed the bug label May 9, 2020
@clanmills
Copy link
Collaborator

I've investigated. I don't know the purpose of the -n/--encoding spec option in the exiv2 command-line program. It doesn't appear to do anything useful for your purposes. I'll explain more below:

In src/iptc.cpp, I found the following:

(*iptcData_)[to] = value;
(*iptcData_)["Iptc.Envelope.CharacterSet"] = "\033%G"; // indicate UTF-8 encoding
if (erase_) xmpData_->erase(pos);

So the data is passed from the command-line into the IPTC data block as binary without being modified. I maintain Exiv2 on macOS and I believe the terminal is UTF-8. So, your command-lines work OK.

2016 rmills@rmillsmbp:~/temp $ curl -LO --silent https://clanmills.com/Stonehenge.jpg
2017 rmills@rmillsmbp:~/temp $ exiv2 -pi Stonehenge.jpg 
Iptc.Envelope.ModelVersion                   Short       1  4
Iptc.Envelope.CharacterSet                   String      3  G
Iptc.Application2.RecordVersion              Short       1  4
Iptc.Application2.Caption                    String     12  Classic View
2018 rmills@rmillsmbp:~/temp $ exiv2 -M'set Iptc.Application2.Caption String "F-OJSE Airbus A330-202 510 AirCalin - Air Calédonie International"' Stonehenge.jpg 
2019 rmills@rmillsmbp:~/temp $ exiv2 -M'set Iptc.Application2.Caption String "PR-RDD Gulfstream G550 5280 Yamandu Empreendimentos e Participações SA"' Stonehenge.jpg 
2020 rmills@rmillsmbp:~/temp $ exiv2 -pi Stonehenge.jpg 
Iptc.Envelope.ModelVersion                   Short       1  4
Iptc.Envelope.CharacterSet                   String      3  G
Iptc.Application2.RecordVersion              Short       1  4
Iptc.Application2.Caption                    String     72  PR-RDD Gulfstream G550 5280 Yamandu Empreendimentos e Participações SA
2021 rmills@rmillsmbp:~/temp $ 

The answers to your questions are:

  1. How do you use -n/--encoding?
    Don't use it. It doesn't do anything.

  2. How do you set the character encoding in IPTC?
    Pass encoded UTF-8 encoded data on the command-line to Iptc.Application2.Caption

So what does -n/--encoding do? It's used internally in the exiv2 command-line parser to define the encoding. Andreas (Huggel) wrote exiv2. He is a very good engineer and probably had an intention to encode all command-line "String" definitions. However, it hasn't been implemented

I've documented using UNICODE for Exif.Photo.UserComment, Exif.GPSInfo.GPSProcessingMethod and Exif.GPSInfo.GPSAreaInformation for Exiv2 v0.27.3 (scheduled on 2020-06-30).

The format of Exif Comment values include  an  optional  charset
specification  at  the beginning.  Comments are used by the tags
Exif.Photo.UserComment,	Exif.GPSInfo.GPSProcessingMethod   and
Exif.GPSInfo.GPSAreaInformation.	 Comments  are stored as Unde-
fined tags with an 8 byte  encoding  definition  follow  by  the
encoded data. The charset is specified as follows:

[charset=Ascii|Jis|Unicode|Undefined] comment
charset=Undefined is the default

$ exiv2 -M'set Exif.Photo.UserComment charset=Ascii My photo' x.jpg
$ exiv2 -pa --grep UserComment x.jpg
Exif.Photo.UserComment	     Undefined	16  My photo
$ exiv2 -pv --grep UserComment x.jpg
0x9286 Photo	 UserComment Undefined	16  charset="Ascii" My photo

$ exiv2 -M'set Exif.Photo.UserComment charset=Unicode \u0052\u006f\u0062\u0069\u006e' x.jpg
$ exiv2 -pa --grep UserComment x.jpg
Exif.Photo.UserComment			   Undefined  18  Robin
$ exiv2 -pv --grep UserComment x.jpg
0x9286 Photo	  UserComment		      Undefined  18  charset="Unicode" Robin

$ exiv2 -M'set Exif.GPSInfo.GPSProcessingMethod HYBRID-FIX' x.jpg
$ exiv2 -pa --grep ProcessingMethod	 x.jpg
Exif.GPSInfo.GPSProcessingMethod		   Undefined  18  HYBRID-FIX
$ exiv2 -pv --grep ProcessingMethod	 x.jpg
0x001b GPSInfo	 GPSProcessingMethod	   Undefined  18  HYBRID-FIX

We could consider implementing -n/--encoding to perform encoding for Exiv2 v0.28. Let's talk more about your needs and expectations.

@markijsseldijk
Copy link
Author

Thanks so much for your investigation.
I have been trialing various options and this morning got the option to work:

In notepad replace the special character with unicode, so for é replace with /u00E9. Then save the notepad .bat file with ANSI encoding (not UTF8). Then run the commands and the correct character is entered into the tags.

I am happy that I have got it working this way. I prepare the tags in excel so it's easy to program the find and replace. If the character replacement can be avoided in future versions of EXIV2 then it would certainly make it 'cleaner'.

Kind regards

Mark

@clanmills
Copy link
Collaborator

I'm very happy that you have this working. Being a native English speaker, I am out of my comfort zone when discussing any encoding except ascii! One of my favourite users (@tester0077) has been asking me to support character set encodings "properly". The biggest puzzle for me is to understand what is wanted and/or expected.

Anyway, thanks for closing this bug. I'm about 96% of the way to Exiv2 v0.27.3 and very happy to see this resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants