Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WWW → url_user → Mangled utf16/8? #119

Closed
Svish opened this issue Aug 1, 2017 · 7 comments
Closed

WWW → url_user → Mangled utf16/8? #119

Svish opened this issue Aug 1, 2017 · 7 comments

Comments

@Svish
Copy link

Svish commented Aug 1, 2017

Tried setting the WWW Extended Field, using Mp3tag v2.83, to https://www.triangelos.net, and when I looked for it in the output from getId3 I couldn't find it. Then I saw it was called url_user, but the content of it was mangled.

How it looks in the tags array

capture

How it looks in the "source"(?) array

capture2

Something not quite right happening in the copying there or something? 🤔

If it helps, I exported the resulting array using var_export: var_export.txt

@Svish
Copy link
Author

Svish commented Aug 1, 2017

Tried to use the WOAR (Official artist/performer webpage) Extended Field now, and there the text comes out correctly... 🤔

capture

@JamesHeinrich
Copy link
Owner

JamesHeinrich commented Aug 1, 2017

Can you send me a sample of the .mp3 so I can see what's happening, please?
(post a link, attach file directly, or email to info@getid3.org)

@Svish
Copy link
Author

Svish commented Aug 1, 2017

Tried emailing you one now. 👍

@JamesHeinrich
Copy link
Owner

I have looked at the file and basically your tagger (Mp3tag v2.83) is broken. It's writing the WXXX (url_user) ID3v2 frame with a text encoding of UTF-16, but the data it's writing is ISO-8859-1 (or UTF-8, can't tell with this example). getID3 sees the UTF-16 flag on the WXXX frame and parses the contents accordingly, and you end up with 13 (Chinese?) characters instead of 26 Latin characters.
The WOAR (url_artist) field, on the other hand, is written correctly, the encoding is set to ISO-8859-1 and the data is written accordingly, so it shows up correctly.
The data is written identically for both fields, it's just the encoding flag is incorrectly set on WXXX.

You may wish to report this to Mp3tag authors as a bug.

@Svish
Copy link
Author

Svish commented Aug 1, 2017

That makes sense. I've sent them an email about it. Thank you for looking into it 🙂

@Svish
Copy link
Author

Svish commented Aug 2, 2017

Got the following response, and seems the URL in W* frames are supposed to be written in ISO-8859-1 regardless of the encoding field, and that the encoding field is referring to the Description?

Here is a quote from the ID3v2.4 specification:

4.3.2. User defined URL link frame

This frame is intended for URL [URL] links concerning the audio file
in a similar way to the other "W"-frames. The frame body consists
of a description of the string, represented as a terminated string,
followed by the actual URL. The URL is always encoded with ISO-8859-1
[ISO-8859-1]. There may be more than one "WXXX" frame in each tag,
but only one with the same description.

<Header for 'User defined URL link frame', ID: "WXXX">
Text encoding     $xx
Description       <text string according to encoding> $00 (00)
URL               <text string>

So, I think what the PHP library you're attempting to use is ignoring, is that every user-defined URL link frame (WXXX) consists also of a text encoding flag and a – possible empty – description part. The text encoding flag is always written (in your example case it was set to 1, which means UTF-16 in ID3v2 parlance). An empty description part will still be written as a text string according to the encoding. In your case a UTF-16 BOM 0xFEFF followed by an terminating UTF-16 encoded null-character 0x0000. After that the URL is written as ISO-8859-1.

@JamesHeinrich JamesHeinrich reopened this Aug 2, 2017
JamesHeinrich added a commit that referenced this issue Aug 2, 2017
@JamesHeinrich
Copy link
Owner

JamesHeinrich commented Aug 2, 2017

Indeed, the Mp3tag authors are quite right. I have patched getID3 accordingly. Your test file should show up correctly now.

StudioMaX pushed a commit to StudioMaX/getID3 that referenced this issue May 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants