Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

erroneous ID3 tag info #27

Open
ejhumphrey opened this issue Jul 18, 2018 · 1 comment
Open

erroneous ID3 tag info #27

ejhumphrey opened this issue Jul 18, 2018 · 1 comment

Comments

@ejhumphrey
Copy link

I'm not sure if this relates to #4, but I've found that at least sox (on debian!) tries to parse out file duration using the reported bit-rate. Unfortunately for me, the reported bitrate is way wrong for at least ≈90 tracks (of the 100k+), and probably wrong for another couple hundred ... these particularly bad tracks claim to have bitrates in excess of "100M", which sox (at least) parses as bits per second. I'd point out that stereo 16bit wav is 1.4Mbps.

The list of suspicious file IDs is here, if anyone wants to double-check / confirm ... the extension is txt, but it's JSON formatted, key point to sox-reported bitrate.

More fortunately, removing all the ID3 tags fixes the issue. I'd propose perhaps exporting all ID3 tags to a static dump over the collection (per #4), and then removing all the ID3 tags to sanitize the collection.

@mdeff
Copy link
Owner

mdeff commented Jul 20, 2020

Thanks for the investigation @ejhumphrey.

I'd propose perhaps exporting all ID3 tags to a static dump over the collection (per #4), and then removing all the ID3 tags to sanitize the collection.

Seems like a good solution. Do you know of any other metadata that should be cleaned or removed to sanitize such audio collection?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants