Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WAV file problem combining non-latin RIFF and ID3v2 tags #338

Closed
paulijar opened this issue Aug 26, 2021 · 4 comments
Closed

WAV file problem combining non-latin RIFF and ID3v2 tags #338

paulijar opened this issue Aug 26, 2021 · 4 comments

Comments

@paulijar
Copy link
Contributor

paulijar commented Aug 26, 2021

I ran in to the following problem when testing WAV files tagged with the Mp3tag application. It seemed to happen only when the tags contained mixed Latin and non-Latin scripts, although I didn't test this quite extensively.

When tagging with non-Latin characters, Mp3tag writes the UTF-8-encoded data to ID3v2.3 tags, and a "substitute strings" to the RIFF header. In the substitute string, all non-Latin characters are replaced with ? characters. Now, when getID3 combines the different kinds of tags with CopyTagsToComments, it cannot merge these RIFF tags and ID3v2.3 tags properly. Instead, the [comments] section of the result contains both versions of the tags, and what's worse, the RIFF tag with all those ? characters comes first.

Meanwhile, if the same tag contents are saved to a mp3 file, the strategy used by Mp3tag app is pretty much the same: The UTF-8 data goes to ID3v2.3 and corresponding substitute string goes to ID3v1. But in this case, getID3 is smart enough to merge the tags so that the [comments] field contains only the UTF-8-encoded data.

I have uploaded a pair of sample files here, one wav and one mp3, both defining the same title/album/artis tags:
https://drive.google.com/drive/folders/1qevkYHRrmPvN5lFYaaxgJVOfF9WK4e5h?usp=sharing

Here are the corresponding analyze results after CopyTagsToComments:
analyze_results_mp3.txt
analyze_results_wav.txt

This was detected on the getID3 version 1.9.20-202107131440.

@JamesHeinrich
Copy link
Owner

JamesHeinrich commented Aug 28, 2021

Should be fixed in 5f6d2ac
Thanks for the sample files.

@paulijar
Copy link
Contributor Author

Thanks for the quick response and action. However, I can still find some residual cases. One of them is the real-life music file from one of my users which has title 永夜抄 ~ Eastern Night. Here, the problematic part seems to be the full-width space following the Japanese Kanji characters as well as the full-width tilde next to it. When Mp3tag transliterates these to 8859-1 -compatible format, it converts these characters into normal space and normal tilde characters, and apparently getID3 is only looking for characters replaced with ? characters. I'm sure that also many other punctuation characters have full-width variants which are used in Chinese and Japanese.

I have uploaded new files non-latin2.wav and non-latin2.mp3 to the previous share to demonstrate this full-width character issue. It can be seen, that the problem is present both on wav and mp3. However, on mp3 it is not really a problem for my app, because the ID3v1 tags are placed last and I'm only reading the first tag of each kind. I was wondering, would it be possible to move the RIFF tags to be handled after ID3v2 in similar manner?

@JamesHeinrich
Copy link
Owner

Good idea. ID3v1 and RIFF tags are now both processed after any other tag types (if present) so the first entry in comments is more likely to be accurate.
4e02ed0

@paulijar
Copy link
Contributor Author

paulijar commented Sep 1, 2021

Thanks, works fine for my use cases now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants