-
-
Notifications
You must be signed in to change notification settings - Fork 882
Description
Prerequisites
- I have written a descriptive issue title
- I have verified that I am running the latest version of ImageSharp
- I have verified if the problem exist in both
DEBUG
andRELEASE
mode - I have searched open and closed issues to ensure it has not already been reported
ImageSharp version
3.1.7
Other ImageSharp packages and versions
None (only the main package is installed)
Environment (Operating system, version and so on)
Windows 11
.NET Framework version
9.0.200
Description
When writing Unicode text to the EXIF UserComment tag, ImageSharp is using UTF-16LE encoding instead of UTF-16BE encoding as required by the EXIF specification.
According to the EXIF specification (JEITA CP-3451, section 4.6.5 "User Comments"), when using Unicode encoding, the first 8 bytes should be "UNICODE\0" followed by text encoded in UTF-16BE.
However, ImageSharp is storing the text in UTF-16LE, which causes the UserComment to be displayed incorrectly in many image viewers and editors.
The root cause of the issue is in the ExifEncodedStringHelpers.cs file (lines 53-60), where Encoding.Unicode
is used which represents UTF-16LE. It should be using Encoding.BigEndianUnicode
instead.
Additionally, the same issue exists when reading UserComment values. In TryGetEncodedStringValue method, when detecting "UNICODE" encoding, it also uses Encoding.Unicode
(UTF-16LE) instead of Encoding.BigEndianUnicode
(UTF-16BE) to decode the value, which means ImageSharp cannot correctly read UserComment values that are properly encoded according to the EXIF specification.
This issue requires careful consideration for backward compatibility. Users who have been writing UserComment tags with previous versions of ImageSharp have data encoded in UTF-16LE format. If the fix simply switches to UTF-16BE encoding for both reading and writing, those existing images would have their UserComment values read incorrectly. A potential solution might involve detecting the byte order or providing migration options to ensure both existing and new data can be handled correctly.
Steps to Reproduce
You can reproduce this issue with the following code, or check the complete reproduction repository at: https://github.com/nirvash/ImageSharpExifUserCommentEncodingBug
using SixLabors.ImageSharp;
using SixLabors.ImageSharp.Metadata.Profiles.Exif;
// Load an image
using var image = Image.Load("sample.jpg");
// Create EXIF profile if it doesn't exist
var exif = image.Metadata.ExifProfile ?? new ExifProfile();
// Set Unicode text in UserComment
exif.SetValue(ExifTag.UserComment, "Hello World! こんにちは世界");
// Apply EXIF profile to the image
image.Metadata.ExifProfile = exif;
// Save the image
image.Save("output.jpg");
When examining the UserComment value in the saved image file, you can see that the text is encoded in UTF-16LE instead of UTF-16BE.
When checking with a binary editor, after the "UNICODE\0" header, the letter "H" is stored as 48 00 instead of 00 48 (which is the UTF-16LE byte order).
According to the EXIF specification, it should be stored in UTF-16BE encoding with the byte order 00 48 for the letter "H".