Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find the closest matches does not find the closest matches #120

Open
touwys opened this issue Feb 27, 2023 · 14 comments
Open

Find the closest matches does not find the closest matches #120

touwys opened this issue Feb 27, 2023 · 14 comments

Comments

@touwys
Copy link

touwys commented Feb 27, 2023

This is an issue observed throughout testing:

The issue here is that the matched tracks, which are found by listFix(), do not even remotely match, or resemble, the original track. It should be noted that this occurs too often, but not in every instance.

The main thrust of the issue, however, is that these total mismatches occur while there are numerous, almost identical, copies known to exist in the Media Directory.

Also refer to the discussion here.


@touwys
Copy link
Author

touwys commented Feb 28, 2023

@Borewit:

Should this issue rather be posted at the current PR-release on test?

@Borewit
Copy link
Owner

Borewit commented Feb 28, 2023

No, I suspect it is not introduced by that PR, so better to capture like this.

@Borewit Borewit added the bug Something isn't working label Feb 28, 2023
@touwys
Copy link
Author

touwys commented Feb 28, 2023

No, I suspect it is not introduced by that PR, so better to capture like this.

Yes, I'll leave it here, as I have noticed the issue all along while testing the other PR's as well.

@touwys
Copy link
Author

touwys commented Apr 18, 2023

@Borewit:

Operationally, the ultimate measure of success of listFix() is going to be determined by how effective it is in finding the closest matches for mismatched playlist tracks. Speed and accuracy are the two essential ingredients to this. At the moment, speed is sufficient, but accuracy lacks — by a wide margin. Since accuracy is crucial to successful outcomes, I propose that this issue is put up next for a fix.

@Borewit
Copy link
Owner

Borewit commented Apr 20, 2023

You do the ground work for this one @touwys, good example where the matching algorithm currently flaws (does not pick the best result) are very useful.

I have to idea's to improve the matching algorithm:

  1. I prefer the track from the same folder, if a reasonable match is found in the same folder
  2. As the parent folder(s) name(s) could represent the title, or artist, I think we could take those into account

@touwys
Copy link
Author

touwys commented Apr 21, 2023 via email

@touwys
Copy link
Author

touwys commented Apr 26, 2023

@Borewit, on second thoughts, we have to pay this issue long, and careful attention, because there is more to it than casually meets the eye. The supremacy of listFix() as an app, stands completely on the quality of its search for matching tracks — how closely the search results match the originals of a fractured playlist. Now, while it probably constitutes a challenge bigger than it's worth considering, don't you think that we should replace the current search model with one deploying the music file tags? Searching the tags provides many more options with which to improve the accuracy of the algorithm.

@Borewit
Copy link
Owner

Borewit commented Apr 28, 2023

don't you think that we should replace the current search model with one deploying the music file tags?

That idea crossed my mind a few times.
Reading metadata, or alternatively using acoustic finger prints are way better technologies to identify audio tracks, then just the filename, which could be as meaningless as "track 01". However, this information is not available in the playlist, and if the track is missing there is nothing to read from the original track. The only thing we have is the filename.

@Borewit
Copy link
Owner

Borewit commented Apr 29, 2023

A useful start for me would be if you could tap into the existing algorithm, and "translate" its current train of reasoning for me.

Based on: #106 (comment)

The "closest match" is purely based on the filename portion of the audio track, so excluding the parent folder.

public int score(String filename1, String filename2)
{
return scoreMatchingTokens(splitFileName(filename1), splitFileName(filename2));
}

This score function is chopping the file name into words, e.g.: "01 Madonna - Like a Prayer.mp3" becomes something like: ["01", "Madonna" , "Like", "Prayer"].

scoreMatchingTokens function is then comparing these words, in each track in your library, also converted to a similar list of words. Then a score is basically calculated comparing those sets of words.

private int scoreMatchingTokens(List<String> array1, List<String> array2)

Based on that score the matches are sorted, and the highest scored matches are kept.

@touwys
Copy link
Author

touwys commented Apr 29, 2023 via email

@Borewit
Copy link
Owner

Borewit commented Apr 29, 2023

The point is, there is no original file to compare with @touwys.

Let assume you have the following M3U playlist:

#M3U
C:\Users\Borewit\Music\Rodriguez - Rich Folks Hoax.flac

Yet that file does not exist at that location, since I moved to:
C:\Users\Borewit\Music\Rodriguez\1970 - Cold Fact\Rich Folks Hoax.flac

So we only have the path, no size, no tags, not anything. Neither File size is reliable indicator, as this is strongly related to the encoding of an audio file. You may want to restore a playlist going replacing mp3 tracks with FLAC.

@touwys
Copy link
Author

touwys commented Apr 29, 2023 via email

@Borewit
Copy link
Owner

Borewit commented Apr 29, 2023

Thanks, thus I stand corrected.

Then I probably don't understand point 2 of #120 (comment).

@touwys
Copy link
Author

touwys commented Apr 29, 2023

Thanks, thus I stand corrected.

Then I probably don't understand point 2 of #120 (comment).

Your reply was spot-on. It is I who am moving on very unfamiliar terrain as far as untangling the intricacies of Windows, and other software operations and their interconnectedness, are concerned.

I took note that there is literally nothing else to work with, than the basic filename. If the horse is dead already, how many ways are left to saddle it? The salient question is, is it yet possible to improve upon the quality of the listFix() search results? The restrictions laid on by the file name ("what" to search for), don't apply to the method of search ("how" to search), and this could be the more fruitful avenue of investigation.

•••

@Borewit Borewit added improvement and removed bug Something isn't working labels May 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants