-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with CopyTagsToComments #238
Comments
Can you please provide a sample file for testing, either here or to info@getid3.org |
I sent you a mail. |
Thank you for the sample file. The The problem lies in the detection of "wrong" ID3v1 encodings in module.tag.id3v1.php lines 68-87. This code block is supposed to detect Cyrillic text in Windows-1251 or KOI8-R text encodings but it's clearly not working as intended in your example. I will need to look at the problem further to see if I can come up with a sensible solution. To a human who is familiar with the language it might be obvious as to whether it should read "Petäjäveräjät" or "Petдjдverдjдt", but there is no real definitive code way of determining the intended encoding automatically. Really there shouldn't be any need to guess, because all ID3v1 tags should be in ISO-8859-1 but there are a vast number of files in the wild that have been tagged according to "local" encoding standards for characters sets not supported by ISO-8859-1 (including Cyrillic). I will ponder the problem further and post back with what I decide to do. |
Prevent ID3v1 encoding from being falsely detected as Windows-1251 #238 Add option encoding_id3v1_autodetect to disable auto-detecting ID3v1 encodings entirely
Turns out there was actually a design flaw in the ID3v1 encoding autodetection. It should be fixed in 5100f75 Thanks for the bug report and sample file. |
Thanks for the quick resolution. However, there is still a related issue remaining with files tagged with non-Latin scripts. At least when using foobar2000 to tag files with such scripts, what happens, is that the ID3v1 tags will get filled with '?' characters which are used as substitution for those characters not supported in ISO-8859-1. Then after Here is an example file, where I entered some Cyrillic, Arabic, and Chinese characters to the tags using foobar2000: |
CopyTagsToComments should ignore badly-translitterated ID3v1 tags when they appear to be equivalent to the value stored in another tag format that supports multibyte characters. For example, tagging a file with "中文标题" in Foobar2000 will store "????" as the title in ID3v1. #238
Thanks again for the new sample file. This should be resolved in 4b3c92a |
Great, thanks! Works for me now. As a minor nitpick, the Cyrillic Artist field from the |
One change included in my above commit is that ID3v1 is forced to be processed last, which makes it easier for me to detect bad transliterations in ID3v1, but as a side effect means that you're more likely to get a "good" variant in the first position of [comments] if for some reason multiple representations are not auto-ignored. In my testing your sample file the Album (Arabic), Artist (Cyrillic), and Title (Chinese) are all correctly detected and appropriately ignored. I'm not sure why it would be different on your system. I also made a small change in 9c67941 related to a potential mismatch between comments and comments_html, but that's unrelated to the Cyrillic artist issue. |
That "Ԙ" is indeed quite exotic character and maybe it's not universally supported for this reason. According to Wikipedia, it's only used in some non-Slavic minority languages within Russia. That, however, is not the significant factor here. It turned out, that the question marks are shown on any Cyrillic field if and only if the option |
Ignore my comments about "Ԙ", apparently that character isn't present in fonts that ship with Windows 7, on which I was testing this (including the Wikipedia article). I checked on a Win10 machine and it shows up fine. |
I think that the function
CopyTagsToComments
doesn't work as intended after 96430a1. I believe that the purpose of this function is to enable consistent tag retrieval for the application developers regardless of the tag format (ID3v1/2, APE, ...). Especially on files with both ID3v1 and ID3v2 tags, the function should make arbitration between the tag types and select the best available tag.After the above mentioned commit, the result of
CopyTagsToComments
does not arbitrate between ID3v1 and ID3v2. Instead, thecomments
field will contain data from both tag types if there are any differences. On my tests, the data from ID3v1 always came first and the data from ID3v2 in the second cell. As my application has been expecting to get only single value, it is now using the ID3v1 tag, which is often the worse alternative: every field is truncated to 30 characters, and non-ASCII characters are corrupted, which might actually be another bug. I think that my ID3v1 tags contain properly ISO 8859-1 encoded accented characters like ä, ö, and é, but all of those get converted to some Cyrillic characters.The text was updated successfully, but these errors were encountered: