Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REQ: Ability to set percentage match cutoff and/or keyword blacklist such as "live" or whitelist "official" in title. #12

Open
famewolf opened this issue May 17, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@famewolf
Copy link

I've gotten some bad matches on songs...typically someone recording the singer as a live concert. I'd like to have it only accept high percentage matches or whitelist things like the word "Official" or blacklist the word "Live" in the title.

@dmzoneill
Copy link
Owner

the code uses lechentein distance to match names and titles, with a match ratio greater than 0.8.

Unfortunately this is never going to be perfect, increasing the match ratio can possibly get you better results.

But consider the following:

if i record 2 songs.

  1. official song off the CD
  2. recording of me singing it in the shower.

If i then upload both of them to youtube with the same name:

dave - the great title
dave - the great title

Which 1 is correct based off the name? the problem here is user input is always at fault.

If you have some way of making better determination, i can certainly implement it for you :)

@famewolf
Copy link
Author

famewolf commented Jun 15, 2023

The problem is I'm getting alot of matches where it IS the artist singing the song but it's at some club where you can barely hear them and a bunch of people are talking over them....I'm then having to go delete that particular track....if I ever run ydl again it's going to grab the same track presumably...I'd rather set the ratio to .9 minimum and/or ignore tracks with "live" in the title. I don't see why allowing user set conditions would be a problem? Perhaps via environment variables.

@dmzoneill
Copy link
Owner

ill take a look today at adding some flexibility around matching.
But it wont solve the problem for you.

the code actually does the following already

  1. searches
  2. iterates the search results checking the names for leventstein distance
  3. each search result gets a match ratio, eg 0.95
  4. from all the matches it picks the one with the highest match ratio
  5. finaly only accepts it if the match is greater than 0.8

ill add the option to increase the 0.8.
but i'm 100% sure you will still have people on youtube uploading bootleg rips, live gigs, karaoke, whatever as perfectly named "artist - track name" < that will always match the search 100% perfect.

@cxtal
Copy link

cxtal commented Jan 20, 2024

One solution is to use a service like audiotag.info (based on a database of audio spectrograms) to attempt to identify some extracts from the downloaded YouTube video. Perhaps you could have multiple samples extracted from the same song, say, at the start, middle, and ending and based on some rules, ie: if all samples could be matched then it is highly likely to be a YouTube video with just the song itself and not someone's karaoke party. I see audiotag.info has an API, or maybe there are better alternatives, but it should be possible to narrow down the quality of a downloaded YouTube song a little better than matching just metadata which is what Lidarr itself seems to be doing.

@dmzoneill
Copy link
Owner

Could definitely look into such a solution :)
PR's are also welcome.
I'm working on a number of things currently.

@famewolf
Copy link
Author

famewolf commented Mar 4, 2024 via email

@dmzoneill dmzoneill added the enhancement New feature or request label Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants