Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New tags: Copyright, ISRC, Artists, AlbumArtists, URL #42

Merged
merged 5 commits into from
May 17, 2021

Conversation

JuniorJPDJ
Copy link
Contributor

@JuniorJPDJ JuniorJPDJ commented May 1, 2021

@@ -1801,8 +1813,17 @@ def update(self, dict):
MP3DescStorageStyle(u'BARCODE'),
MP4StorageStyle('----:com.apple.iTunes:BARCODE'),
StorageStyle('BARCODE'),
StorageStyle('UPC', read_only=True),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Effect I wanted to achieve was reading data from this 3 tags and saving only BARCODE tag.
Is it a way to do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Can you explain a little more about why you want that behavior? (I think this is the way to achieve it, but some testing would be useful, probably.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPC, UPN and EAN are some specific tags for barcodes which are not supported on most of formats and by most of players.
It would be cool to fill main tag with data from those.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it... but I guess what I was asking about was a little more detail on which software does support those fields. It seems somewhat harmless to write fields that are unpopular, if that has the effect of increasing compatibility. (That's the usual policy we use in this library.)

Copy link
Contributor Author

@JuniorJPDJ JuniorJPDJ May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem is that you can't easly guess if it's UPC, UPN or EAN, all of them are barcodes but those are different standards for those.
UPC and UPN are the same, but EAN is other standard, all of them are BARCODEs tho and IMO should fill main BARCODE tag.

I'm not sure which players does support them but those were specified in this index as supported in some standards: https://wiki.hydrogenaud.io/index.php?title=Tag_Mapping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mediafile.py Outdated
@@ -1643,6 +1643,12 @@ def update(self, dict):
StorageStyle('ARTIST'),
ASFStorageStyle('Author'),
)
artists = ListMediaField(
#MP3ListDescStorageStyle(desc='ARTISTS'),
Copy link
Contributor Author

@JuniorJPDJ JuniorJPDJ May 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I do something like this?
I've tag like:

'TXXX:ARTISTS': TXXX(encoding=<Encoding.UTF16: 1>, desc='ARTISTS', text=['Miuosh/Jimek/Narodowa Orkiestra Symfoniczna Polskiego Radia w Katowicach']),

And it's supposed to be interpreted as list: ['Miuosh', 'Jimek', ...]

Copy link
Contributor Author

@JuniorJPDJ JuniorJPDJ May 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source for this tag: https://picard-docs.musicbrainz.org/en/appendices/tag_mapping.html#artists
Example in FLAC:

 'artists': ['Donatan', 'Miuosh', 'Pelson', 'Ero', 'Małpa'],

Example in MP3 above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sampsyo that's why it's disabled (answer to review below).
I don't know how to handle this.
There's no class like that and I don't know if I need to write mine (would like not to) or if there's any implementation.

Copy link
Contributor Author

@JuniorJPDJ JuniorJPDJ May 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've more problems:
I mentioned TXXX:ARTISTS consisting of array of artists but...
I also have TXXX:ARTISTS consisting of one artist but also containing unescaped /:
'TXXX:ARTISTS': TXXX(encoding=<Encoding.UTF16: 1>, desc='ARTISTS', text=['AC/DC']),

Copy link
Contributor Author

@JuniorJPDJ JuniorJPDJ May 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be similar issue to problem I've encountered in #43:
ID3v23 multivalue tags are not being parsed as multivalue in mutagen - value in not being split on separator and separator.. can be already part of list element.
I need some advice about this.

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this started! I know it's still marked as WIP, but FWIW the tests seem to need some attention (according to CI results).

@@ -1801,8 +1813,17 @@ def update(self, dict):
MP3DescStorageStyle(u'BARCODE'),
MP4StorageStyle('----:com.apple.iTunes:BARCODE'),
StorageStyle('BARCODE'),
StorageStyle('UPC', read_only=True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Can you explain a little more about why you want that behavior? (I think this is the way to achieve it, but some testing would be useful, probably.)

mediafile.py Outdated Show resolved Hide resolved
@JuniorJPDJ
Copy link
Contributor Author

JuniorJPDJ commented May 4, 2021

FWIW the tests seem to need some attention (according to CI results).

I was looking at those (already fixed style check) and it didn't look like it failed on my changes. I need someone to clarify this then and maybe suggest some changes.

EDIT: It fails on my changes: just rolled back this commit in my tree and it worked before, but I don't really know which change made it fail.

EDIT2: I managed to fix it. Commits will come tomorrow. :D

@JuniorJPDJ
Copy link
Contributor Author

JuniorJPDJ commented May 10, 2021

albumartists is suggested here: https://kodi.wiki/view/Music_tagging#albumartists and I'd like to make it default in Picard too

EDIT: Picard is probably gonna have it! https://community.metabrainz.org/t/multiple-album-artists/532302

@JuniorJPDJ JuniorJPDJ marked this pull request as ready for review May 10, 2021 22:50
@JuniorJPDJ JuniorJPDJ changed the title New tags: Copyright, ISRC, artists New tags: Copyright, ISRC, Artists, AlbumArtists May 10, 2021
@JuniorJPDJ JuniorJPDJ force-pushed the master branch 4 times, most recently from 6ff1a87 to ecb862c Compare May 11, 2021 18:05
@JuniorJPDJ JuniorJPDJ changed the title New tags: Copyright, ISRC, Artists, AlbumArtists New tags: Copyright, ISRC, Artists, AlbumArtists, URL May 11, 2021
@JuniorJPDJ
Copy link
Contributor Author

I think it's ready ;D

MP3ListDescStorageStyle is multivalue tag for description based custom tags on MP3 files (like TXXX:Something), very similar to MP3DescStorageStyle but multivalue
@JuniorJPDJ
Copy link
Contributor Author

I rebased changes on top of latest commit.

mediafile.py Outdated Show resolved Hide resolved
@sampsyo
Copy link
Member

sampsyo commented May 12, 2021

Thanks for continuing to push this forward! I have to admit that I'm a tiny bit lost in navigating all the changes at this point, so I thought I'd try to sum things up:

  • There's new support for storing lists in ID3v2.3 tags.
  • The mb_{album}artistid fields have become list-valued (depending on the above new feature).
  • There are some other brand-new fields (not the above ID fields) that are added.

And of course the tests are updated as well, but that's neither here nor there. Is that an accurate summary?

With apologies for being a picky maintainer, I would love to consider all of these changes individually. It would be great, for example, to carefully consider whether we do actually want to change those ID fields to be lists—unfortunately, I can imagine this change breaking software that depends on their current list behavior. And when we add new fields, we should consider adding them to the documentation. So is there any chance we could break this up into individual steps?

Broadly speaking, adding the new fields should be fairly uncontroversial, although I do want to track down the source of truth (instead of just trusting the HydrogenAudio wiki alone). The IDs-as-lists change could be problematic, however, and needs some careful consideration.

Thanks for considering it!

@JuniorJPDJ
Copy link
Contributor Author

JuniorJPDJ commented May 12, 2021

Of course ;)

The mb_{album}artistid fields have become list-valued (depending on the above new feature).

Will moving this to another PR satisfy you?
Other changes are pretty much depending on themselves so I don't feel like splitting them out is good idea, but if you have other opinion I could do this.

@JuniorJPDJ
Copy link
Contributor Author

JuniorJPDJ commented May 13, 2021

I moved multivalue musicbrainz to #46.

JuniorJPDJ added a commit to FUMR/tidal-async that referenced this pull request May 14, 2021
also updates gen_title and gen_artists to better format

depends on beetbox/mediafile#42
Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome; thanks for slimming this PR down!! Is there any chance you could include in the thread here any links to documentation you have about where all these tags came from? You've mentioned the HA wiki and MusicBrainz Picard—does that cover all of these tags, or did you use any other sources? (It's always nice to have documentation for exactly why we used the tag mapping we did.)

test/test_mediafile.py Outdated Show resolved Hide resolved
mediafile.py Show resolved Hide resolved
@JuniorJPDJ
Copy link
Contributor Author

Is there any chance you could include in the thread here any links to documentation you have about where all these tags came from?

Of course!
I'll do detailed writeup, just gimme an hour or two ;D

@JuniorJPDJ
Copy link
Contributor Author

JuniorJPDJ commented May 15, 2021

artists: whole thing is defined in Picard tag mapping

copyright: same as above

albumartists: I used Kodi's scheme as it's non-standard at the moment and it seems reasonable
ATM adding this tag to Picard is being discussed here and here and I'd like to support all variants, that's why I created #47 (it needs more changes to code)
I decided it's better to support main variant first, then fix issues with multivariant and then support all variants

barcode changes: I already shown my point of view in discussion above

isrc: also based on Picard tag mapping

url: there things are going a bit worse
Most of standards seem not to support URL mapping so I was basing on files tagged by Tag&Rename, some people from my environment were using, decoded by mutagen

  • ID3 is directly supported by WXXX frame and it's defined in spec
  • MP4 mapping is ©url and it's based on output of Tag&Rename but I wasn't able to find much info about it in google; ffprobe decoded it correctly so it seems to be some standard:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'kdu4r.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf57.83.100
    compilation     : 0
    URL             : https://www.aliexpress.com/
    media_type      : 0
>>> dict(mutagen.File('kdu4r.mp4'))
{'©too': ['Lavf57.83.100'],
 'cpil': False,
 '©url': ['https://www.aliexpress.com/'],
 'furl': ['https://basewin.pl/'],
 'aurl': ['http://chomikuj.pl/'],
 'surl': ['https://drive.google.com/drive/my-drive'],
 'rurl': ['https://ekipatonosi.pl'],
 'burl': ['facebook.com'],
 'stik': [0]}
  • ASF binding was also got from actual wma file tagged by Tag&Rename:
>>> dict(mutagen.File('01. MAROON STRONG.wma'))
{'WM/AudioFileURL': [ASFUnicodeAttribute('https://www.instagram.com/')],
 'Description': [ASFUnicodeAttribute('')],
 'WM/EncodingSettings': [ASFUnicodeAttribute('Lavf58.49.100')],
 'WM/RadioStationURL': [ASFUnicodeAttribute('https://www.netflix.com')],
 'WM/URL': [ASFUnicodeAttribute('https://hostuje.net/')],
 'WM/AuthorURL': [ASFUnicodeAttribute('https://jakdojade.pl')],
 'ID3': [ASFByteArrayAttribute(b'ID3\x03\x00\x00\x00\x00\x00/WXXX\x00\x00\x00\x16\x00\x00\x00\x00https://hostuje.net/COMM\x00\x00\x00\x05\x00\x00\x00eng\x00')],
 'WM/UserWebURL': [ASFUnicodeAttribute('https://hostuje.net/')],
 'WM/MusicBuyUrl': [ASFUnicodeAttribute('https://odrabiamy.pl/')],
 'WM/AudioSourceURL': [ASFUnicodeAttribute('https://www.messenger.com')]}

This is however not translated by ffprobe:

Input #0, asf, from '01. MAROON STRONG.wma':
  Metadata:
    encoder         : Lavf58.49.100
    WM/URL          : https://hostuje.net/
    WM/UserWebURL   : https://hostuje.net/
    WM/AudioFileURL : https://www.instagram.com/
    WM/AuthorURL    : https://jakdojade.pl
    WM/AudioSourceURL: https://www.messenger.com
    WM/RadioStationURL: https://www.netflix.com
    WM/MusicBuyUrl  : https://odrabiamy.pl/

Nor Windows details view show it. Windows however shows another URL tag.
image
This is AuthorURL. But purpose of this tag is other than main URL.

I found some forum post that also mentioned this tag so I assumed It's widely supported.

  • Vorbis/FLAC binding used by Tag&Rename was URL but I added also WWW and Website as I found actual uses of them in web. This also isn't standarized in any way. EasyTag seems to tag URL input box into CONTACT field on flac files which seems ambiguous to me. Standard defines CONTACT field as

Contact information for the creators or distributors of the track. This could be a URL, an email address, the physical address of the producing label.

Which is not what I wanted to achieve by url tag, it's more like authorurl.

Also adds additional barcode tags to read from
JuniorJPDJ added a commit to FUMR/tidal-async that referenced this pull request May 15, 2021
also updates gen_title and gen_artists to better format

depends on beetbox/mediafile#42
Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the extra information; this helps a lot!

I admit I'm a bit skeptical of the multiple Vortis-style mappings for barcode and the new url field. The former is a bit less worrisome because the fields are read-only, but even so, we don't have a lot of evidence that we need to support these mappings. The latter is a bit more worrisome because I know people who use Vorbis comments a lot don't love having duplicate data in their free-form tag data, so multiple mapping should be approached quite carefully.

I know there's no standard approach for these tags, but maybe a good place to start would be to use the most popular mappings that we know are in use in real, frequently used software. Then, we can respond as usage becomes more popular: that is, when someone complains that their other software isn't compatible, we can add a mapping to compensate. What do you think about this more minimalist approach?

mediafile.py Show resolved Hide resolved
@JuniorJPDJ
Copy link
Contributor Author

I know people who use Vorbis comments a lot don't love having duplicate data in their free-form tag data

Well.. They won't be happy about albumartists from #47 then :X

I can remove WWW and Website then, I'm OK with that. Just wanted to maximize compatibility ;)

I'm still struggling with barcode as some standards are not-cool and are defining separate fields for other barcode standards and I honestly would like to have them in one tag, which is understood by most of software.
What do you think, should I remove it or leave it here?

@sampsyo
Copy link
Member

sampsyo commented May 16, 2021

Sounds great! For barcode: just to clarify, are you saying that there's a trend among software out there to define lots of different fields for different barcode classification standards, such as UPC and EAN?

I can see how that argues in favor of supporting all these miscellaneous tags as read-only mappings and "BARCODE" itself as the only writable version! Googling around reveals threads like this one where people are equally frustrated by the proliferation of different tag names for different kinds of barcodes.

Anyway, this sounds perfectly reasonable to me. Happy to do the "read-only for weird names" thing here—until someone out there wants to record both a UPC and an EAN for the same file. 😂

@JuniorJPDJ
Copy link
Contributor Author

JuniorJPDJ commented May 16, 2021

are you saying that there's a trend among software out there to define lots of different fields for different barcode classification standards, such as UPC and EAN?

That's what I want to say. Yes! :D
Examples are here: dBPowerAMP and some wiki for Vorbis tags.

I'll fix my changes in few minutes then and.. It's ready to merge! :D

…StorageStyle

fixes rest of problems in beetbox#43

Shouldn't be enabled on tags like `artists` or other without confidence that values will not contain separator (eg. `artists: ['AC/DC']`).
Format of mbid is known and I'm sure we can enable it here.

When solving beetbox#21 this separator value in `el.split('/')` also should be taken into account.
@JuniorJPDJ
Copy link
Contributor Author

Done! :D

@JuniorJPDJ
Copy link
Contributor Author

JuniorJPDJ commented May 16, 2021

until someone out there wants to record both a UPC and an EAN for the same file.

UPC itself can be converted to EAN by adding 0 at the front, so I think they will not try to kill me for this change even if it overwrite one of them ;)

Aaand UPN is wrong name for UPC in our case as UPN is medical-only:

Drugs by National Drug Code (NDC) number. Pharmaceuticals in the U.S. use the middle 10 digits of the UPC as their NDC number. Though usually only over-the-counter drugs are scanned at point of sale, NDC-based UPCs are used on prescription drug packages and surgical products and, in this case, are commonly called UPN Codes.

Source: Wikipedia

Musicbrainz for example allows only one barcode per release, names it barcode and forces you to use EAN, as UPC can be represented as subset of EAN (they advise how to convert UPC -> EAN).

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome!! Merging now.

@sampsyo sampsyo merged commit 7ecd861 into beetbox:master May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants