Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning! Content file comparison have wrong result! #1015

Closed
Werve opened this issue Jul 7, 2022 · 11 comments
Closed

Warning! Content file comparison have wrong result! #1015

Werve opened this issue Jul 7, 2022 · 11 comments
Labels
bug Bug reports. needs-reproduction A bug that has not been able to be reproduced.

Comments

@Werve
Copy link

Werve commented Jul 7, 2022

Describe the bug
Results presented as equal in standard content search shows file with different hash!
I hope it didn't happen often before now (maybe only the last version), otherwise who knows how many files I lost...

To Reproduce
Steps to reproduce the behavior:

  1. Select a folder as reference (in my case drive I:)
  2. select a subfolder of that reference folder as "Normal" (no reference), in my case LOST.DIR
  3. Choose standard and Content search
  4. See error matched

Expected behavior
Only identical files must be grouped and showed.

Screenshots
image

Desktop (please complete the following information):

  • OS: Windows 10
  • Version 4.3
@Werve Werve added the bug Bug reports. label Jul 7, 2022
@arsenetar
Copy link
Owner

I am not able to reproduce this issue following the steps, so going to need some additional information. What setting values do you have in options? Do you get the same issue if you Clear the Cache before running the scan? It may be helpful if you could turn on debug logging in options and attach the log after running the same scan.

@arsenetar arsenetar added needs-reproduction A bug that has not been able to be reproduced. needs-information More information needs to be collected. labels Jul 7, 2022
@Werve
Copy link
Author

Werve commented Jul 7, 2022

Now I have moved many files so I can only play it for a particular group of files. But i checked some resulted and they have different SHA-256.
In particular I noticed it always occurs with the LOST.DIR folder.
I attach the debug files.
Clear the Cache didn't fix it.
The options setted are this (but they are in the debug too):
image

hash_cache.zip

@arsenetar
Copy link
Owner

Thanks for the additional information and the debug log, the log definitely shows an issue happening with the files in the LOST.DIR folder. I have added a protection to prevent a possible cause of the log message from having files show as duplicates. This may potentially have been an issue since 4.2.0 or could be new in 4.3.0 depending on specifics.

I need some extra information to determine exactly what is going on that is causing the error in the log. I have made an updated version with the protection and additional debug output to help track this down. Would you be able to download and run the version from: https://voltaicideas.net/dupeGuru/TestPackages/4.3.1/dupeGuru_win64_4.3.1-dev.1.exe and again share the debug log?

arsenetar added a commit that referenced this issue Jul 8, 2022
- Add protection for empty hash digests in comparison of non-zero size
  files
- Bump version to 4.3.1-dev for identification
@Werve
Copy link
Author

Werve commented Jul 8, 2022

Oh no, I've been working in dupeguru for the last week on some backups to restore a microsd, is it possible to know from the debug log how many times the bug probably happened?

Unfortunately having deadline tomorrow I did not have time to check the files and I assumed that they were identical. Due to the short time I can no longer replicate the problem because I no longer have that set of files.

@arsenetar
Copy link
Owner

Yeah the debug log would have lines with WARNING - Couldn't get digest in it for all potentially affected files. Even when this happens for a file only files that had both this happen and were the same file size would have been marked as duplicates (when they may not be).

@Werve
Copy link
Author

Werve commented Jul 8, 2022

Thank you for your quick reply.
Out of curiosity why there is no hash to a file? Is it a behavior of xxhash?
Or maybe is related to the dot in the LOST.DIR (I'm hoping the bug was only on that folder)

@arsenetar
Copy link
Owner

There is an issue looking up the hash in the cache db, this causes an exception (which triggers that log message). The digest is None since it never was hashed or found in the db due to the exception. I am not sure which of two values (stat.st_size, stat.st_mtime_ns) are triggering the cache db issue, although I have suspicions. I am going to rework some of that code so that cache db issues do not stop hashing in addition to the guard I had already added.

@Werve
Copy link
Author

Werve commented Jul 8, 2022

Oh, (I suppose, I did not know the code) stat.st_mtime_ns has perhaps connection with the fact that chk files do not have a date (less than unix time, it is an error that other software have given).
I noticed that in the debut log there are lines before enabling the debug option in the GUI, does that mean that I should also find any errors that occurred in recent days before the debug gui checked?

If that is true seems I'm safe, that only occurred on LOST. DIR folder and not on the whole backup

@arsenetar
Copy link
Owner

Okay, if that is the case with those chk files, then that probably explains the source of the issue. The log will contain entries which were at a warning or higher level (such as WARNING, ERROR) even without debug checked. Debug just enables the lower severity levels.

@Werve
Copy link
Author

Werve commented Jul 8, 2022

Thanks so because i read the log and there are only that type of warnings on the lost.dir folder means my backup is save!
Really glad to know, had and still have so many tech problems, at last this didn't go (too) wrong

@arsenetar arsenetar mentioned this issue Jul 8, 2022
14 tasks
@arsenetar arsenetar removed the needs-information More information needs to be collected. label Jul 8, 2022
@arsenetar
Copy link
Owner

Going to close this as it is resolved in the latest changes, a release will be out in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug reports. needs-reproduction A bug that has not been able to be reproduced.
Projects
None yet
Development

No branches or pull requests

2 participants