Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misinterpreting encoding of exif copyright message #270

Open
davidekholm opened this issue Jun 20, 2017 · 10 comments

Comments

Projects
None yet
5 participants
@davidekholm
Copy link
Contributor

commented Jun 20, 2017

The attached sample image contains an exif copyright message that's UTF-8 encoded, but metadata-extractor misinterprets this as ISO-8859-1. It's correctly interpreted on Mac (where the system property file.encoding is UTF-8), but fails to decode it on Windows.

vecht sahara 140328 1005

@acwolff

This comment has been minimized.

Copy link

commented Jul 25, 2017

I have this problem too!

@mateusz-fiolka

This comment has been minimized.

Copy link

commented Nov 21, 2017

The same problem for me, but for the caption/description IPTC field, on OSX in my case.

@Nadahar

This comment has been minimized.

Copy link
Contributor

commented Nov 21, 2017

For anyone that wants to solve this: This is caused by not specifying character encoding when converting form bytes to string. Many methods have overloads with or without a Charset parameter, and when the version without the this parameter is used, Java use the "default Charset" in the conversion.

This default charset can't be trusted to be what you want and should almost never be used. It's set when the JVM is launched, and can be specified by whoever starts the application. It defaults to the OS' "standard" encoding, which is a localized codepage on Windows for backwards compatibility. On macOS and Linux it's UTF-8 by default.

FindBugs can be used to quickly find all places in the code where there is reliance on the default encoding.

In case FindBugs isn't installed/configured, here's the status for the current master when it comes to default encoding reliance:

Source/com/drew/metadata/mov/QuickTimeDescriptor.java:61 Found reliance on default encoding in com.drew.metadata.mov.QuickTimeDescriptor.getMajorBrandDescription(): new String(byte[]) [Of Concern(19), High confidence]
Tests/com/drew/lang/ByteTrieTest.java:41 Found reliance on default encoding in com.drew.lang.ByteTrieTest.testBasics(): String.getBytes() [Of Concern(19), High confidence]
Source/com/drew/lang/SequentialReader.java:314 Found reliance on default encoding in com.drew.lang.SequentialReader.getString(int, String): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/lang/SequentialReader.java:304 Found reliance on default encoding in com.drew.lang.SequentialReader.getString(int): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/metadata/adobe/AdobeJpegReader.java:55 Found reliance on default encoding in com.drew.metadata.adobe.AdobeJpegReader.readJpegSegments(Iterable, Metadata, JpegSegmentType): new String(byte[], int, int) [Of Concern(19), High confidence]
Source/com/drew/metadata/StringValue.java:74 Found reliance on default encoding in com.drew.metadata.StringValue.toString(Charset): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/metadata/mp4/Mp4Dictionary.java:131 Found reliance on default encoding in com.drew.metadata.mp4.Mp4Dictionary.<static initializer for Mp4Dictionary>(): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/metadata/photoshop/DuckyReader.java:57 Found reliance on default encoding in com.drew.metadata.photoshop.DuckyReader.readJpegSegments(Iterable, Metadata, JpegSegmentType): new String(byte[], int, int) [Of Concern(19), High confidence]
Source/com/drew/metadata/mov/QuickTimeDictionary.java:135 Found reliance on default encoding in com.drew.metadata.mov.QuickTimeDictionary.<static initializer for QuickTimeDictionary>(): new String(byte[]) [Of Concern(19), High confidence]
Tests/com/drew/metadata/exif/ExifSubIFDDescriptorTest.java:69 Found reliance on default encoding in com.drew.metadata.exif.ExifSubIFDDescriptorTest.testUserCommentDescription_ZeroLengthAscii1(): String.getBytes() [Of Concern(19), High confidence]
Tests/com/drew/metadata/exif/ExifSubIFDDescriptorTest.java:80 Found reliance on default encoding in com.drew.metadata.exif.ExifSubIFDDescriptorTest.testUserCommentDescription_ZeroLengthAscii2(): String.getBytes() [Of Concern(19), High confidence]
Source/com/drew/tools/ProcessAllImagesInFolderUtility.java:576 Found reliance on default encoding in com.drew.tools.ProcessAllImagesInFolderUtility$MarkdownTableOutputHandler.writeOutput(PrintStream): new java.io.OutputStreamWriter(OutputStream) [Of Concern(19), High confidence]
Source/com/drew/tools/ProcessAllImagesInFolderUtility.java:557 Found reliance on default encoding in com.drew.tools.ProcessAllImagesInFolderUtility$MarkdownTableOutputHandler.onScanCompleted(PrintStream): new java.io.PrintStream(OutputStream, boolean) [Of Concern(19), High confidence]
Source/com/drew/metadata/mov/metadata/QuickTimeDataHandler.java:76 Found reliance on default encoding in com.drew.metadata.mov.metadata.QuickTimeDataHandler.processAtom(Atom, byte[]): String.getBytes() [Of Concern(19), High confidence]
Tests/com/drew/metadata/exif/ExifSubIFDDescriptorTest.java:48 Found reliance on default encoding in com.drew.metadata.exif.ExifSubIFDDescriptorTest.testUserCommentDescription_AsciiHeaderAsciiEncoding(): String.getBytes() [Of Concern(19), High confidence]
Source/com/drew/metadata/mov/metadata/QuickTimeDataHandler.java:104 Found reliance on default encoding in com.drew.metadata.mov.metadata.QuickTimeDataHandler.processData(byte[], SequentialByteArrayReader): new String(byte[]) [Of Concern(19), High confidence]
Tests/com/drew/metadata/exif/ExifSubIFDDescriptorTest.java:58 Found reliance on default encoding in com.drew.metadata.exif.ExifSubIFDDescriptorTest.testUserCommentDescription_BlankAscii(): String.getBytes() [Of Concern(19), High confidence]
Tests/com/drew/metadata/exif/ExifSubIFDDescriptorTest.java:38 Found reliance on default encoding in com.drew.metadata.exif.ExifSubIFDDescriptorTest.testUserCommentDescription_EmptyEncoding(): String.getBytes() [Of Concern(19), High confidence]
Source/com/drew/metadata/mov/metadata/QuickTimeDataHandler.java:93 Found reliance on default encoding in com.drew.metadata.mov.metadata.QuickTimeDataHandler.processKeys(SequentialByteArrayReader): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/metadata/mov/metadata/QuickTimeDataHandler.java:62 Found reliance on default encoding in com.drew.metadata.mov.metadata.QuickTimeDataHandler.shouldAcceptContainer(Atom): String.getBytes() [Of Concern(19), High confidence]
Source/com/drew/tools/ProcessAllImagesInFolderUtility.java:75 Found reliance on default encoding in com.drew.tools.ProcessAllImagesInFolderUtility.main(String[]): new java.io.PrintStream(OutputStream, boolean) [Of Concern(19), High confidence]
Source/com/drew/lang/RandomAccessReader.java:390 Found reliance on default encoding in com.drew.lang.RandomAccessReader.getString(int, int, String): new String(byte[]) [Of Concern(19), High confidence]
Tests/com/drew/lang/SequentialAccessTestBase.java:247 Found reliance on default encoding in com.drew.lang.SequentialAccessTestBase.testGetString(): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/metadata/avi/AviRiffHandler.java:90 Found reliance on default encoding in com.drew.metadata.avi.AviRiffHandler.processChunk(String, byte[]): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/metadata/jfif/JfifReader.java:58 Found reliance on default encoding in com.drew.metadata.jfif.JfifReader.readJpegSegments(Iterable, Metadata, JpegSegmentType): new String(byte[], int, int) [Of Concern(19), High confidence]
Source/com/drew/metadata/exif/makernotes/OlympusMakernoteDescriptor.java:819 Found reliance on default encoding in com.drew.metadata.exif.makernotes.OlympusMakernoteDescriptor.getCameraIdDescription(): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/metadata/iptc/IptcReader.java:175 Found reliance on default encoding in com.drew.metadata.iptc.IptcReader.processTag(SequentialReader, Directory, int, int, int): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/metadata/photoshop/PhotoshopDescriptor.java:318 Found reliance on default encoding in com.drew.metadata.photoshop.PhotoshopDescriptor.getSimpleString(int): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/metadata/Directory.java:475 Found reliance on default encoding in com.drew.metadata.Directory.getInteger(int): String.getBytes() [Of Concern(19), High confidence]
Tests/com/drew/metadata/DirectoryTest.java:210 Found reliance on default encoding in com.drew.metadata.DirectoryTest.testSetStringGetInt(): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/metadata/eps/EpsReader.java:256 Found reliance on default encoding in com.drew.metadata.eps.EpsReader.extractXmpData(Metadata, SequentialReader): String.getBytes() [Of Concern(19), High confidence]
Source/com/drew/metadata/exif/ExifDescriptorBase.java:649 Found reliance on default encoding in com.drew.metadata.exif.ExifDescriptorBase.getUserCommentDescription(): new String(byte[], int, int) [Of Concern(19), High confidence]
Source/com/drew/metadata/TagDescriptor.java:267 Found reliance on default encoding in com.drew.metadata.TagDescriptor.get7BitStringFromBytes(int): new String(byte[], int, int) [Of Concern(19), High confidence]
Source/com/drew/metadata/wav/WavRiffHandler.java:111 Found reliance on default encoding in com.drew.metadata.wav.WavRiffHandler.processChunk(String, byte[]): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/imaging/FileTypeDetector.java:46 Found reliance on default encoding in com.drew.imaging.FileTypeDetector.<static initializer for FileTypeDetector>(): String.getBytes() [Of Concern(19), High confidence]
Source/com/drew/imaging/FileTypeDetector.java:183 Found reliance on default encoding in com.drew.imaging.FileTypeDetector.detectFileType(BufferedInputStream): new String(byte[], int, int) [Of Concern(19), High confidence]
Source/com/drew/metadata/mov/metadata/QuickTimeDirectoryHandler.java:86 Found reliance on default encoding in com.drew.metadata.mov.metadata.QuickTimeDirectoryHandler.processData(byte[], SequentialByteArrayReader): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/metadata/photoshop/PhotoshopReader.java:65 Found reliance on default encoding in com.drew.metadata.photoshop.PhotoshopReader.readJpegSegments(Iterable, Metadata, JpegSegmentType): new String(byte[], int, int) [Of Concern(19), High confidence]
Source/com/drew/metadata/mov/metadata/QuickTimeDirectoryHandler.java:68 Found reliance on default encoding in com.drew.metadata.mov.metadata.QuickTimeDirectoryHandler.processAtom(Atom, byte[]): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/metadata/mp4/Mp4Descriptor.java:59 Found reliance on default encoding in com.drew.metadata.mp4.Mp4Descriptor.getMajorBrandDescription(): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/lang/StringUtil.java:84 Found reliance on default encoding in com.drew.lang.StringUtil.fromStream(InputStream): new java.io.InputStreamReader(InputStream) [Of Concern(19), High confidence]
Source/com/drew/metadata/xmp/XmpReader.java:101 Found reliance on default encoding in com.drew.metadata.xmp.XmpReader.readJpegSegments(Iterable, Metadata, JpegSegmentType): new String(byte[], int, int) [Of Concern(19), High confidence]
Source/com/drew/metadata/icc/IccReader.java:215 Found reliance on default encoding in com.drew.metadata.icc.IccReader.getStringFromInt32(int): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/metadata/icc/IccReader.java:70 Found reliance on default encoding in com.drew.metadata.icc.IccReader.readJpegSegments(Iterable, Metadata, JpegSegmentType): new String(byte[], int, int) [Of Concern(19), High confidence]
Source/com/drew/metadata/icc/IccDescriptor.java:90 Found reliance on default encoding in com.drew.metadata.icc.IccDescriptor.getTagDataString(int): new String(byte[], int, int) [Of Concern(19), High confidence]
Source/com/drew/metadata/icc/IccDescriptor.java:339 Found reliance on default encoding in com.drew.metadata.icc.IccDescriptor.getInt32FromString(String): String.getBytes() [Of Concern(19), High confidence]
Source/com/drew/imaging/riff/RiffReader.java:81 Found reliance on default encoding in com.drew.imaging.riff.RiffReader.processChunks(SequentialReader, int, RiffHandler): new String(byte[]) [Of Concern(19), High confidence]
Source/com/drew/metadata/jfxx/JfxxReader.java:58 Found reliance on default encoding in com.drew.metadata.jfxx.JfxxReader.readJpegSegments(Iterable, Metadata, JpegSegmentType): new String(byte[], int, int) [Of Concern(19), High confidence]
Tests/com/drew/lang/CompoundExceptionTest.java:74 Found reliance on default encoding in com.drew.lang.CompoundExceptionTest.testNoInnerException(): new java.io.PrintWriter(OutputStream) [Of Concern(19), High confidence]
Tests/com/drew/lang/CompoundExceptionTest.java:72 Found reliance on default encoding in com.drew.lang.CompoundExceptionTest.testNoInnerException(): new java.io.PrintStream(OutputStream) [Of Concern(19), High confidence]
Source/com/drew/metadata/exif/ExifReader.java:62 Found reliance on default encoding in com.drew.metadata.exif.ExifReader.readJpegSegments(Iterable, Metadata, JpegSegmentType): new String(byte[], int, int) [Of Concern(19), High confidence]
@acwolff

This comment has been minimized.

Copy link

commented Jul 4, 2018

Another year past without an solution

@drewnoakes

This comment has been minimized.

Copy link
Owner

commented Jul 4, 2018

@acwolff you are welcome to submit a PR.

@acwolff

This comment has been minimized.

Copy link

commented Jul 4, 2018

@drewnoakes that is already done by David Ekholm, see the first message in this thread.

@davidekholm

This comment has been minimized.

Copy link
Contributor Author

commented Jul 4, 2018

@Nadahar

This comment has been minimized.

Copy link
Contributor

commented Jul 4, 2018

@acwolff I can't see any PR. Can you link to it?

@acwolff

This comment has been minimized.

Copy link

commented Jul 4, 2018

@Nadahar I think the PR is the first message in this thread.
I did not write a PR, I don’t have the knowledge to do that.
I'm just suffering from this problem.

@Nadahar

This comment has been minimized.

Copy link
Contributor

commented Jul 4, 2018

@acwolff PR is short for "Pull Request". A description for those that don't know how to make one can be found here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.