Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate title entries after library update #547

Closed
schnillerman opened this issue Mar 8, 2021 · 55 comments
Closed

Duplicate title entries after library update #547

schnillerman opened this issue Mar 8, 2021 · 55 comments
Assignees
Labels
bug scanner Problems related to the scanner

Comments

@schnillerman
Copy link

Whenever I update files that are registered in the library, a few of them are registered as duplicates (in the same album) in the library after a re-scan.
The nature of the file update can be just (mp3-) tag updates, but also file renaming (directory name remains the same).

The only solution I have found to this is to rename or move the album's directory, re-scan, rename/move it back to its original state and do a second rescan.

Version: 8.2.0 - 1614990095 @ Sat Mar 6 01:43:25 CET 2021

This happens in earlier 8.x and 7.x versions as well.

@mherger
Copy link
Contributor

mherger commented Mar 8, 2021

A full wipe cache & scan would do without moving files, wouldn't it?

@schnillerman
Copy link
Author

schnillerman commented Mar 8, 2021

I just tried this - now all my favorites are gone. :(

And yes, it fixes duplicate entries, but it usually takes longer (2,5h) than 2x rescan (7 minutes per re-scan for 215.000 titles): Just the database deletion takes as long as 1 rescan.

And it would seem weird to me if multiple library entries exist (within the same library, of course) that refer to one and the same file. Maybe a library consistency check that takes care of duplicate entries for same file would be helpful.

@mherger mherger added the scanner Problems related to the scanner label Aug 23, 2021
@bobbydriver
Copy link

I would love to see this one fixed, it's been a long standing problem I noticed too. I use the same workaround (renaming the directory) - and a full rescan is not really a practical option for those of us with large libraries.

Feels like there ought to be an easy solution in the scanner - as @schnillerman suggests, a consistency check or some such

@michaelherger
Copy link
Member

Could one of you please outline how that easy consistency check would work?

@schnillerman
Copy link
Author

schnillerman commented Jan 27, 2022 via email

@bobbydriver
Copy link

What seems to be happening is - when you change a file within an album somehow, or add a new file to the album - then do a new/changed scan:

The scan finds the new file and creates it within a new album, so you end up with two duplicated albums

1 - the original album with the unchanged files but not the changed/new
2 - the new album with just the changed/new file and none of the unchanged files

So the logic needs to be something like

  1. new file is found
  2. read album tag
  3. read the folder path
  4. does an album with the same name exist with the same folder path?
  5. if yes then add the file to the existing album
  6. if no - carry on as before and create a new album

@schnillerman
Copy link
Author

schnillerman commented Jan 27, 2022 via email

@michaelherger
Copy link
Member

Now here's the problem: the reason why a regular scan is so much faster than a full wipe & rescan is because the former only deals with changed items, doesn't do these kinds of optimisations and checks. Any additional check will slow it down.

In order to keep things as fast as possible, we have to be sure what we're talking about. The issue subject line says "Duplicate title entries". The description says "duplicates (in the same album)". And the latest suggestion is about duplicated albums. Maybe both are valid. And I'm pretty sure complaints about genres have been heard, too...

I fear in order to fix this all I'd need the amount of time I currently don't have.

@mherger mherger added the bug label Jan 28, 2022
@mherger
Copy link
Contributor

mherger commented Jan 28, 2022

Oh, and artists: #704

@bobbydriver
Copy link

Thanks Michael - appreciate that it's probably a lot of effort. If I get chance I might set up a test rig and do some proper documentation of the issues/scenarios. I don't know perl so I couldn't do anything with the scanner, but i could at least work out the sql queries that ID the culprits

As for the scan time - I had the exact same thought. It would really need to be a separate scan option for occasional use. A "tidy/remove duplicates scan" or something

In actual fact I'm more than happy with all the LMS functionality these days and the only thing left which bugs me is the way the new/changed scan can make a mess of db integrity. I'd actually really love a UI that allowed me to query and tidy up my music db without the inconvenience of a full drop and rescan, but I know that's dreamland :)

@michaelherger
Copy link
Member

Could both of you please describe what tag you'd change (artist, album, title...), and what the outcome would be? I think I've identified one issue if you changed some tracks' artist names without getting rid of the original artist name (eg. different artists of the same name, you rename only one of them). This could likely cause empty albums in the original artist's collection (see #704).

@michaelherger
Copy link
Member

Would #705 be a duplicate of this issue?

@mherger
Copy link
Contributor

mherger commented Jan 29, 2022

Those affected by the file renaming issue: what OS are you using?

@schnillerman
Copy link
Author

schnillerman commented Jan 29, 2022

Happens when I change attributes like

  • track number padding
  • genre
  • upper/lower case of band name, album or track title

If the file name upper/lower case is changed, it happens as well.

I'm running LMS on a Linux Debian.

@michaelherger
Copy link
Member

I think I've identified the cause of the duplication in case of a file name case change. See #705 (comment). There's some background information, and how you might be able to work around / fix this until I have a fix in LMS.

@michaelherger
Copy link
Member

Could you please give the 8.3 nightly a try (https://downloads.slimdevices.com/nightly/?ver=8.3)? I applied a few changes to the scanner. I'm no longer able to get invalid records after

  • I changed file name form ABC.mp3 to abc.mp3
  • I changed genre, back and forth
  • I changed "Track title" to "Track Title"
  • I changed "Album Title" to "Album title"

@schnillerman
Copy link
Author

schnillerman commented Feb 1, 2022

I just installed 8.3 over 8.2 and will have a look!

Do I need to perform a complete re-scan?

@bobbydriver
Copy link

I loaded the nightly and did the same tests - works ok for me too (on Raspbian 10 Buster/Max2Play)

The duplicate albums still get created though if you fundamentally change a filename (other than a case change) - or add new files to the album folder - then run a new/changed rescan. Does that need to be raised as a separate issue to keep things clear?

@schnillerman
Copy link
Author

The duplicate albums still get created though if you fundamentally change a filename (other than a case change) - or add new files to the album folder - then run a new/changed rescan. Does that need to be raised as a separate issue to keep things clear?

Thank you so much for mentioning this behavior - I forgot that this happens to me a lot, too, because I've been working around this by temp_renaming the updated folder, scanning, re-naming again, re-scanning!

@bobbydriver
Copy link

Just did a test added some new files to an existing album folder
Essentially the scan is picking up the new files by timestamp, and creating them as a new album - not recognising that the album already exists and that they should be added to the existing album

I realise that adding this integrity step to a new/changed files rescan will slow things down, but maybe not too much? After all - it only needs to be run against the new files discovered

If you put the cover art in each album folder, then the SQL to id the existing duplicates is quite simple - because although it allocates a new album id to the new files - the value for cover (which is essentially the path to the cover.jpg) is the same for both the new and existing albums

SELECT  distinct album, cover
FROM tracks
WHERE cover IN (
    SELECT cover
    FROM tracks
    GROUP BY cover
    HAVING COUNT(distinct album) > 1
)

Not sure how this works for people who use embedded cover art though

@bobbydriver
Copy link

bobbydriver commented Feb 1, 2022

OK - just been digging some more and that SQL is not ideal, as it also finds occurrences where you have files in the same album folder but with different album tags. That's just bad tagging/mistakes, so handy for IDing where your library is messed up, but not a definitive ID of where the new/changed scan problem has happened

I also worked out this SQL query on the albums table

Select A.title,A.id, C.name, b.artwork
from albums A, contributors C
join albums B
on A.title = B.title
and A.contributor=C.id
and A.contributor= B.contributor
and A.year = B.year
and A.artwork <> B.artwork
group by A.title,A.artwork

this IDs where a duplicate album name has the same artist and year BUT a different coverart hash - which also pulls out records where the new/changed scan problem has happened
BUT
also IDs other issues, like where you have moved a file to a different album but not changed the album tag, or where the album tag within the folder is actually different - so again bad tagging

Neither of these queries take bad tagging into account, so only useful for manually interrogating libraries for bad integrity - not the new/changed scan problem

@schnillerman
Copy link
Author

schnillerman commented Feb 1, 2022 via email

@bobbydriver
Copy link

haha - no, but I did laugh when I saw that episode

@michaelherger
Copy link
Member

michaelherger commented Feb 1, 2022 via email

@schnillerman
Copy link
Author

schnillerman commented Feb 2, 2022

What I usually did in order to produce the duplicate DB entries (but with 8.3, the behavior seems to be different):

  1. Tag files (with e.g. mp3tag), same album: mistakenly have some of the files with a different year (e.g. 1-5 of 12 with year 2011, 7-12 of 12 with year 2012)
  2. Save them to library (tag to dir/file name; year is included in dir name) -> files 1-5 and 7-12 are in different subdirs
  3. Scan
  4. 2 albums (one for year 2011, one for 2012) are created in LMS DB
  5. Correction of file tags and file location (mp3tag)
  6. Re-scan
  7. 2 albums with same year, artist, album name are shown in LMS, one with files 1-5, one with files 7-12, even though files are now in same subdir

As I mentioned above, this behavior seems to be different with LMS 8.3:

  1. Tag files (with e.g. mp3tag), same album: mistakenly have some of the files with a different year (e.g. 1-5 of 12 with year 2011, 7-12 of 12 with year 2012)
  2. Save them to library (tag to dir/file name; year is included in dir name) -> files 1-5 and 7-12 are in different subdirs
  3. Scan
  4. 1 album is now in DB with year 2012 - verified also by looking for artist: only one album with the same name is present, even though files with year 2011 tagged also show value 2011 in year tag (verified by looking at individual song via "more > further info > show tags")
  5. Correction of file tags, dirs and file names in mp3tag
  6. Re-Scan
  7. Album still shows as one entry with corrected year 2011

It seems in LMS 8.3 now it works as expected.

But what about same albums with different years? (They sometimes exist, e.g. re-releases, and the release info is only present in comment tag)?

@bobbydriver
Copy link

Just done the same test as above with v8.3 and confirm the same result.
Added a new album with one file having a different date tag
It creates one album not two (as it did in 8.2)

So the problem is now just with the album tag

If I add new tracks into an existing album folder - even if the album tag is identical to the existing album tags in the same folder, it still creates a new duplicate album in the db for the new tracks

to test

  1. Take any album that is already in the library
  2. Add a new track or tracks into the folder and tag with the same album tag as the existing tracks
  3. Run a new/change rescan
  4. New duplicate album is created with just the new tracks

The behaviour is sort of understandable, as the existing tracks aren't new or changed, but the folder contents have changed

I don't know how to fix it - maybe the scan needs to look for new/changed subfolders (date modified on the folder) and rescan the whole folder?
or when it sees new/changed files it triggers a rescan of the whole subfolder that the new files sit in?

Not sure if either of these are viable

@schnillerman
Copy link
Author

Also, with LMS 8.3, if I correct capitalization inside e.g. title tag and therefore, the file name also gets renamed (same name, different capitalization), something strange happens:

The album is not duplicated, but the song in question is, even though it's actually currently playing, not displayed correctly in the player, nor is the playlist of that album:
image

@michaelherger
Copy link
Member

  1. Take any album that is already in the library
  2. Add a new track or tracks into the folder and tag with the same album tag as the existing tracks
  3. Run a new/change rescan
  4. New duplicate album is created with just the new tracks

This is working as expected here. Are you 100% certain album and artist information are absolutely identical? No upper/lower case issues? No whitespace?

Would you mind sharing the library.db with such an issue with me?

https://www.dropbox.com/request/T3RctyzGgNg0oFDubq6a

@michaelherger
Copy link
Member

Can I install 8.2 over 8.3 in order to do that?

Why would you want to install the previous version? It's fixed in 8.3, not 8.2.

But to answer your question: yes, you can go back and forth as you like.

@schnillerman
Copy link
Author

schnillerman commented Feb 3, 2022

Can I install 8.2 over 8.3 in order to do that?

Why would you want to install the previous version? It's fixed in 8.3, not 8.2.

But to answer your question: yes, you can go back and forth as you like.

Sorry, Michael - the problem of duplicate albums by adding files to a folder or capitalization changes does not seem to be an issue in 8.3 anymore - at least from what I tried.
That's why I thought that if you want that particular error, I would need to reproduce it in 8.2 - because that's where it definitely happened.
Anyway - I'll try and reproduce the error I described above (#547 (comment)) and share library.db with you via PM.
I understand that probably you have responded mainly to bobbydriver's comments, so please excuse my chipping in.

@bobbydriver
Copy link

OK - getting closer to the problem now I think.

When you said you couldn't recreate the error by following the steps I described, i was surprised. So I ran through them again and i was even more surprised when I found that you were right - it added the new tracks to the correct (existing album)!

But i was sure that I had seen the issue only yesterday on the same 8,3 nightly, so I went back through the steps and managed to recreate the error in more specific circumstances

In the example, I'm using two Joy Division live shows. Both were partially included in a boxset some years ago, so I had them in my library as two separate albums, one for each partial live show.

Someone then shared remaining tracks which weren't included on the boxset and so I go to add them to each existing folder to complete the albums

In example 1 - I follow the original instructions I gave you. Added the extra files and tagged them to have the same album name as the existing files. The new files are the yellow ones and you can see the first 3 tracks are the old ones - unchanged
Capture1

As mentioned - a rescan happily adds these to the existing album

In example 2 - I follow the same steps, with the only difference being that this time, when I load mp3tag to change the album tag on the new files, I highlight all the files and save the tags. This re-saves the existing tags to the existing files - even though none of them have actually changed. So now you see that the Date Modified is updated for ALL the files, but Date Created obviously stays the same for the original files
Capture2

A rescan now creates the duplicate album issue
Capture3

The original album with the original tracks
Capture4

and the duplicate album with the additional tracks
Capture5

They are both showing up as "New Music" so it's obviously changing the timestamp on the original album according to the date modified but why is it not adding the additional tracks as it does in example 1?

@bobbydriver
Copy link

To add to my confusion, I tried another test case

See example3 - here I don't add any new files to the folder, i just change an mp3 tag on existing track 1 (which changes the Date Modified on this one file only)
Capture6

I was expecting a rescan to create a new album for that one file, with the rest of the tracks remain in the old album

But it doesn't?! Just updates the existing album (see the altered title tag on track 1)
Capture7

So question now is - what is the difference between example 2 and example 3. Why does it behave differently to the Date Modified change?

@michaelherger
Copy link
Member

Would you mind sharing your library.db (with the above duplicate albums in it!)?

https://www.dropbox.com/request/T3RctyzGgNg0oFDubq6a

Without the database it's hard to tell what's going on there.

@bobbydriver
Copy link

Will do - have tidied up the duplicates from yesterday so I will create a new test example and document for you, then upload my library.db and screenshots etc

@bobbydriver
Copy link

Hmm - i now seem to have corrupted my library and it's triggered a full rescan - not ideal!

On the positive side, I think I've narrowed down the exact circumstances in which the issue now occurs

Most of the error modes from older versions seem to have been fixed - which is great

While I wait to get my library back - can you try this

  1. Add a new track to an existing album folder
  2. Makes sure artist/year/genre tags are all the same BUT change the album tag for ALL tracks to be something new (most common example is changing it from "Album Name" to "Album Name [Expanded Edition]" )
  3. You should see the existing tracks still have their original Date Created but all the tracks will have a new Date Modified

Run a new/changed scan

What happens? For me I get 2 new albums created (one with the existing tracks and one with just the new track)

@michaelherger
Copy link
Member

Thanks @bobbydriver! I received your files and will investigate. Can you confirm you're using the latest LMS 8.3?

@michaelherger
Copy link
Member

Oh, I think I know what's going on: new tracks are scanned before the updated tracks. The new tracks therefore create a new album, because their album doesn't exist yet. Only once that's done the modified tracks would be updated. And as they already exist, the album referenced in the track would be updated, rather than the track linked to a new album. This causes the previous album to become a duplicate of the new one. That might become tricky to fix.

@bobbydriver
Copy link

Ah ok - that makes sense, not sure how you fix that. I guess if it scanned updated before new files that would bring it's own problems?

And yes - I am on the latest 8.3 nightly (if it matters)

@michaelherger
Copy link
Member

Yes, changing the processing order is the most obvious approach I'll investigate first.

@mherger mherger closed this as completed in 09f788d Feb 5, 2022
@michaelherger
Copy link
Member

Please let me know should you encounter any new side-effects. Thanks for your help identifying this long-standing issue, @bobbydriver!

@mherger mherger self-assigned this Feb 5, 2022
@frank1969b
Copy link

@mherger , GREAT You fixed this, too! This has been an evergreen either (to me it always happened if there was a new bonus edition of an album and i added the new bonus tracks to it - and this is often nowadays! :) )
THANKS!

@bobbydriver
Copy link

Thanks Michael! Testing it tonight. will let you know

@bobbydriver
Copy link

Looking good to me - the problem is gone. I didn't think this would ever get fixed so THANK YOU so much!

@michaelherger
Copy link
Member

Good to know! Sometimes it needs a fresh mind to look into these old issues 😉.

@schnillerman
Copy link
Author

Sorry to interrupt you again guys, but LMS 8.3.0 - 1644170574 @ Sun 06 Feb 2022 07:24:08 PM CET is still creating duplicates for me.

Use case: _Capitalization change in tag albumartist and dir/file name

  • Tag: Jeff The Brotherhood > JEFF The Brotherhood
  • File name: Jeff The Brotherhood (Global Chakra Rhythms) - 01 - Global Chakra Rhythms.mp3 > JEFF The Brotherhood (Global Chakra Rhythms) - 01 - Global Chakra Rhythms.mp3
  • Folder name: ..\Jeff The Brotherhood\Jeff The Brotherhood - (2015) Global Chakra Rhythms > ..\JEFF The Brotherhood\JEFF The Brotherhood - (2015) Global Chakra Rhythms

Re-scan results in 2 identical entries, both with all tracks:
image

One of the without display of artist Name:
image

One with display of artist name:
image

It seems that file name changes are registered as new files, too:
image

Can share library.db if required.

@mherger
Copy link
Contributor

mherger commented Feb 7, 2022

Did you change artist name in tag, folder and file name all at the same time? I haven't tried all three at once yet.

Did you completely delete library.db (not just wipe its content) in the past week? Some of the new behaviour require some table schema to be updated/re-created from scratch.

Yes, I'd be interested in your library.db in its broken state: https://www.dropbox.com/request/T3RctyzGgNg0oFDubq6a

@schnillerman
Copy link
Author

schnillerman commented Feb 7, 2022

I did not delete library.db, however did a full re-scan before I changed the files as described above.

Just dropped the library.db.

Now re-scanning with all files named library.* renamed and LMS restarted (triggered a re-scan).

I did the following:

  1. re-tagged the files
  2. re-scanned
  3. duplicate entries were added
  4. renamed the files
  5. duplicate entries remained

@michaelherger
Copy link
Member

Thanks for the uploaded file. As you confirmed it's not using the latest schema. It would still do case sensitive comparisons under certain circumstances.

@schnillerman
Copy link
Author

I'll keep you updated as duplicates occur. For now, as the others already said: Huge thank you for dealing with this issue.

@bobbydriver
Copy link

This has been working fine with everything I've thrown at it in terms of folder/file changes so far, but today found a test case that still gives duplication

  1. added a brand new folder with an album in it. Scanned into the database and it created 2 duplicate albums. This is because I had made an error in the album tag on one of the tracks so it was slightly different to the other tracks. So the behaviour was correct - so far

  2. went into the folder and corrected the album tag on the bad track so it matched the album track of the others. Re-wrote the existing tags on all the files, so that they all got a new Date Modified. Did a rescan and it still shows up as 2 duplicate albums

  3. went back into the folder and this time changed the album title to something new - and applied to all tracks. Rescanned and it is still showing as 2 duplicate albums

Can you recreate this? I'm on today's latest nightly

@michaelherger
Copy link
Member

Please whenever you encounter such issues put aside a copy of your server.log, scanner.log and library.db and send me a copy. This would greatly help me to better understand what's going on.

@michaelherger
Copy link
Member

Ok, I've been able to reproduce this... Argh...

mherger added a commit that referenced this issue Feb 16, 2022
…ust rename its album, but have to merge it with the other tracks of the album, before (potentially) deleting the old album entry.
@michaelherger
Copy link
Member

You wouldn't need to touch any of the other tracks. "fixing" one track's album name would be enough to trigger this issue: on the initial scan two albums would be created for those tracks. When we fix the album title, LMS would simply rename the second album's title, but not try to merge all tracks of that album.

The latest commit attempts to fix this situation: it would re-assign the fixed track to the existing correct album, then remove the old, incorrect album (if no track was left on it). Please give the next build another try.

And thanks for your reports!

@bobbydriver
Copy link

Tested at my end and that seems to have fixed it - thank you!

Will pipe up again if I find any more test fail scenarios, but fingers crossed that this is done now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug scanner Problems related to the scanner
Projects
None yet
Development

No branches or pull requests

5 participants