-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: deleted item re-appears upon next import of URLs #433
Comments
@cdvv7788 the timestamp > delete version will be fixed automatically once we remove the json main index don't bother fixing it for now, it would just add a bunch of workaround complexity for a problem that's going away soon anyway. |
Ok. Please leave this open so we don't forget to check back once we merge the index changes. |
@mauvity can you please check if the current version on master fixes it? We refactored the index internals. |
There is still a functional difference between the two ways:
|
Oh right, the delete functionality has not been touched in the refactor. |
@pirate what should we do about this? Maybe add a confirmation and change both methods to remove the actual files? If the admin is a way to maintain the index, leaving orphaned folders may be unnecessary. |
I think removing the delete button from the snapshot admin detail page is enough for now. (Leave the delete button on the list page the way it is now). |
@cdvv7788 is this fixed in v0.5.0? If not can we do that. |
I'm pretty sure this was already fixed in v0.5.6. Comment back here if you're still seeing the issue and I'll reopen the ticket. |
The bug re-appearing in ArchiveBox version v0.7.1. Quite odd to observe new import full of deleted entries earlier. I've just observed another bug, which could be related - a handful of deleted entries re-appeared on the top of the list with newer dates. These entries weren't indexed yet, I suspect the extractor had them already in the queue, inserting them back as it went though them. cc: @pirate |
@235 Can you confirm this is happening when you delete an older completed Snapshot that does not have the same URL present in a later import? Deleting does not prevent a URL from being re-added in the future, so if you deleted some Snapshots and then re-imported the same URLs later on, they will re-appear (as new Snapshot entries). Deleting during an import is also totally broken/not advised. This is the downside of making all my import code immutable/indempotent (it overwrites entries entirely on changes instead of mutating them in-place). Because Snapshots are operated on in-memory, it rewrites the DB and disk entries several times from memory as it does work during the import process, and as long as it's still in-memory being operated on it doesn't notice when a user deletes the DB/disk entry out from underneath it. |
As discussed in the other ticket - this was deletion DURING an import. We can ignore the report here, and focus on on the other ticket discussion. TY! |
Thank you in advance for your help,
Sorry if this isn't experienced universally and it's just something I'm not doing right 😕
Describe the bug
Deleted item is re-imported upon the next import of (unrelated) URLs
Steps to reproduce
Software versions
The text was updated successfully, but these errors were encountered: